[PATCH] Integer Sanitizer Initial Patches

Hi all,

Attached are patches that add a new 'sanitizer' to clang for detecting
and reporting integer overfllows. Unlike the checks added by
-fcatch-undefined-behavior, these also include non-undefined-behavior
checks.

The attached clang patch adds:

-fsanitize=unsigned-integer-overflow
and
-fsanitize=integer

The first adds support for inserting checks for unsigned integer
overflow, the latter is a new 'sanitizer group' which is used to
enable all integer-related checking. In the future I'd like to
include value-losing conversions, but for now this includes the
existing checks (signed overflow, divide-by-zero, shifts) as well as
the new unsigned overflow checks.

Also attached is a corresponding patch for compiler-rt that extends
ubsan to include support for reporting unsigned as well as signed
overflows.

Two issues with this that I'm hoping can be discussed:
* As per PR14247 (http://llvm.org/bugs/show_bug.cgi?id=14247), the
ubsan checks presently aren't recoverable. This reduces these checks'
utility for quickly getting a new large codebase into shape as
mentioned in that bug, but this is of course even more important to be
made optional when reporting unsigned overflows is enabled as well.

* Extending "ubsan" is unfortunate given its name, but these checks
don't seem to merit a separate library either. Thoughts?

Thank you for your time and looking forward to your comments.

~Will

**Written during the LLVM Hacker's Lab :slight_smile: **

0001-Add-unsigned-integer-overflow-sanitizer.patch (5.43 KB)

0002-Add-fsanitizer-integer-group-for-all-integer-related.patch (2.51 KB)

0001-Add-support-for-unsigned-overflow-diagnostic-reporti.patch (9.09 KB)

Hi all,

Attached are patches that add a new 'sanitizer' to clang for detecting
and reporting integer overfllows. Unlike the checks added by
-fcatch-undefined-behavior, these also include non-undefined-behavior
checks.

It's not obvious to me that it's useful to add runtime instrumentation
for unsigned overflow in particular; my instinct is that you'll get a
ton of false positives for existing codebases. Also, there isn't any
obvious way to fix code which legitimately takes advantage of unsigned
overflow. Do you have any data here?

The attached clang patch adds:

-fsanitize=unsigned-integer-overflow
and
-fsanitize=integer

Suspicious-behavior checkers are fundamentally not the same thing as
the existing sanitizers because the code might be correct as-is. Does
it make sense to put them under -fsanitize?

-Eli

Hi all,

Attached are patches that add a new 'sanitizer' to clang for detecting
and reporting integer overfllows. Unlike the checks added by
-fcatch-undefined-behavior, these also include non-undefined-behavior
checks.

It's not obvious to me that it's useful to add runtime instrumentation
for unsigned overflow in particular; my instinct is that you'll get a
ton of false positives for existing codebases. Also, there isn't any
obvious way to fix code which legitimately takes advantage of unsigned
overflow. Do you have any data here?

This is a good point, and I'm glad you brought it up. In terms of the
reported overflows being guaranteed bugs, you're certainly right there
will be quite a few false positives.

However I don't think this should be treated as a bug-reporting tool,
but rather a tool to ensure your code is doing what you expect. This
is important because the rules governing integer behavior are
non-trivial (especially to non-expert programmers, but even to them!),
and often the exact behavior is anything but clear from a glance at a
line of code. Add in variations across the various C/C++ standards,
unexpected behavior when porting to a new platform (or just
32bit->64bit), and the general lack of attention given to integer
overflow or the exact dynamic range of data and it can be awfully
useful to have a report of what overflows do occur in your program.
Sure, some (many?) of them might be intentional, but that's exactly
why a tool like this is so very useful: it lets you say things like
"my program only overflows an integer where I explicitly expect it
to". Without such a tool, how do you really know?

Expanding on that, I think a reasonable solution (one that was used by
a previous version of IOC) would be to introduce a function attribute
to allow programmers to flag safe uses of unsigned overflow(hashing,
crypto, RNG, etc). I believe address sanitizer employs a similar
technique, although perhaps for different reasons. If it would help,
I can prioritize reviving that code, just let me know :).

As for the data you asked for, it is something we're looking into but
I don't think we're ready to report on it yet. That said, given the
prevalence of undefined behavior integer bugs[1] in mature commonly
used applications, it seems reasonable to expect that programmers will
make the same kinds of bugs with unsigned integer types. Additionally
standards like CERT's Secure C coding standard suggest avoiding
unsigned overflow, which does speak for the existence of a community
that would find such a report useful. There certainly have been CVE's
reported due to unsigned integer overflow :).

In short, I think having tools like provide a useful (and presently,
missing!) component to a developer's arsenal when it comes to
reasoning about their code and ensuring it's doing what is expected.

Most of these arguments also motivate the reporting of lossy casts,
particularly implicit ones.

Hopefully this helps give this some context and addresses some of your
concerns, and I'd be happy to discuss it further.

As a final note I'd like to add that the code impact on Clang is
relatively minor given the existing -sanitize=undefined, which might
be useful for weighing the utility of the checker, although maybe not.

[1] We found undefined integer behavior in the vast majority of
open-source applications we looked at, see our paper: "Understanding
Integer Overflow in C/C++",

The attached clang patch adds:

-fsanitize=unsigned-integer-overflow
and
-fsanitize=integer

Suspicious-behavior checkers are fundamentally not the same thing as
the existing sanitizers because the code might be correct as-is. Does
it make sense to put them under -fsanitize?

Hmm, this should probably best be answered by someone else. While
putting it in as a "sanitizer" makes sense to me, one of the largest
reasons I used the sanitizer flags and infrastructure was to leverage
the recent (and not unrelated) Ubsan checking for signed overflow/bad
shifts, which are included in the default 'integer' sanitizer profile.
However I'm not overly attached to the particular flag interface, so
if there are objections or suggestions for something else I'd be
perfectly willing to work towards that instead.

-Eli

Thank you for your time,

~Will

Hi,

Attached are patches that add a new 'sanitizer' to clang for detecting
and reporting integer overfllows. Unlike the checks added by
-fcatch-undefined-behavior, these also include non-undefined-behavior
checks.

This seems like it could be valuable to me, and I think it's in scope
as a -fsanitize= feature (there are some other checks which I'd like
to eventually include in -fsanitize=, which check for possible bugs
which don't result in undefined behavior). For instance, I could
imagine this being something people would turn on when their code is
behaving strangely, as part of a "tell me about suspicious things that
my program did" mode, but that would depend on making these
diagnostics non-fatal (and maybe supporting a suppression system).

Assuming we reach consensus that we want this...

The attached clang patch adds:

-fsanitize=unsigned-integer-overflow
and
-fsanitize=integer

The first adds support for inserting checks for unsigned integer
overflow, the latter is a new 'sanitizer group' which is used to
enable all integer-related checking. In the future I'd like to
include value-losing conversions, but for now this includes the
existing checks (signed overflow, divide-by-zero, shifts) as well as
the new unsigned overflow checks.

Please split the divide-by-zero check into integer and floating-point
cases, and only include the integer case in the -fsanitize=integer
group.

Please also provide a patch to the user's manual documenting the new arguments.

Also attached is a corresponding patch for compiler-rt that extends
ubsan to include support for reporting unsigned as well as signed
overflows.

Two issues with this that I'm hoping can be discussed:
* As per PR14247 (http://llvm.org/bugs/show_bug.cgi?id=14247), the
ubsan checks presently aren't recoverable. This reduces these checks'
utility for quickly getting a new large codebase into shape as
mentioned in that bug, but this is of course even more important to be
made optional when reporting unsigned overflows is enabled as well.

I think this is something we should pursue. My only reservation here
is a concern about the performance impact of making the checks
recoverable, but I have no data there.

* Extending "ubsan" is unfortunate given its name, but these checks
don't seem to merit a separate library either. Thoughts?

I don't think that's a problem. "One Hour Photo" is just the name of
the shop, sir. :slight_smile:

Clang patch:

--- a/lib/CodeGen/CGExprScalar.cpp
+++ b/lib/CodeGen/CGExprScalar.cpp
@@ -414,6 +414,11 @@ public:
       }
     }

+ if (Ops.Ty->isUnsignedIntegerType() &&
+ CGF.getLangOpts().SanitizeUnsignedIntegerOverflow) {
+ return EmitOverflowCheckedBinOp(Ops);
+ }

No braces here, please (and for this same construct later in the file).

   case BO_Mul:
   case BO_MulAssign:
     OpID = 3;
     IID = llvm::Intrinsic::smul_with_overflow;
+ IID = isSigned ? llvm::Intrinsic::smul_with_overflow :
+ llvm::Intrinsic::umul_with_overflow;
     break;

There's a dead store left behind here.

@@ -2031,7 +2054,8 @@ Value
*ScalarExprEmitter::EmitOverflowCheckedBinOp(const BinOpInfo &Ops) {
   if (handlerName->empty()) {
     // If the signed-integer-overflow sanitizer is enabled, emit a call to its
     // runtime. Otherwise, this is a -ftrapv check, so just emit a trap.
- if (CGF.getLangOpts().SanitizeSignedIntegerOverflow)
+ if (CGF.getLangOpts().SanitizeSignedIntegerOverflow ||
+ CGF.getLangOpts().SanitizeUnsignedIntegerOverflow)
       EmitBinOpCheck(Builder.CreateNot(overflow), Ops);
     else
       CGF.EmitTrapvCheck(Builder.CreateNot(overflow));

This doesn't look right -- building with -ftrapv
-fsanitize=unsigned-integer-overflow will use the -fsanitize path for
signed overflow, not the -ftrapv path.

The Clang patch should include a test -- it's not sufficient for this
to be tested just in the compiler-rt tests (they're not run as a part
of normal Clang development).

compiler-rt patch:

--- a/lib/ubsan/ubsan_handlers.cc
+++ b/lib/ubsan/ubsan_handlers.cc
@@ -55,30 +55,35 @@ void
__ubsan::__ubsan_handle_type_mismatch(TypeMismatchData *Data,
   Die();
}

-/// \brief Common diagnostic emission for various forms of signed overflow.
-template<typename T> static void HandleSignedOverflow(OverflowData *Data,
+/// \brief Common diagnostic emission for various forms of integer overflow.
+template<typename T> static void HandleIntegerOverflow(OverflowData *Data,
                                                       ValueHandle LHS,
                                                       const char *Operator,
- T RHS) {
- Diag(Data->Loc, "signed integer overflow: "
- "%0 %1 %2 cannot be represented in type %3")
+ T RHS,
+ bool isSigned) {
+ Diag(Data->Loc, "%0 integer overflow: "
+ "%1 %2 %3 cannot be represented in type %4")
+ << (isSigned ? "signed" : "unsigned")

Please look at Data->Type.isSignedIntegerTy() in here, rather than
passing in an extra flag.

Thank you for your comments.

Updated patches attached!

~Will

0001-Add-unsigned-integer-overflow-sanitizer.patch (26.4 KB)

0001-Add-support-for-unsigned-overflow-diagnostic-reporti.patch (9.92 KB)

--- a/docs/UsersManual.html
+++ b/docs/UsersManual.html
@@ -875,11 +875,12 @@ likely to affect PCH files that reference a
large number of headers.</p>
<!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - -->
<dl>
<dt id="opt_fsanitize"><b>-fsanitize=check1,check2</b>: Turn on runtime checks
-for various forms of undefined behavior.</dt>
+for various forms of undefined or otherwise undesirable behavior.</dt>

I prefer your later use of the word 'suspicious' over 'undesirable',
since unsigned overflow is often desired.

@@ -889,7 +890,10 @@ message is produced at runtime explaining the
problem. The main checks are:
     <a href="ThreadSanitizer.html">ThreadSanitizer</a>, an
<em>experimental</em>
     data race detector. Not ready for widespread use.</li>
<li id="opt_fsanitize_undefined"><tt>-fsanitize=undefined</tt>:
- Enables all the checks listed below.</li>
+ Enables all the checks for undefined behavior. This includes all of
+ the checks listed below other than unsigned integer overflow.</li>
+<li id="opt_fsanitize_integer"><tt>-fsanitize=integer</tt>:
+ Enables checks for undefined or suspicious integer behavior.</li>

This isn't quite right; -fsanitize=address and -fsanitize=thread both
check for undefined behavior. Possibly something like "Enables a set
of checks for undefined behavior which have no impact on address space
layout or ABI, and small runtime impact."

+<li id="opt_fsanitize_fp-divide-by-zero"><tt>-fsanitize=fp-divide-by-zero</tt>:
+ Floating point division by zero.</li>

I would prefer 'float-divide-by-zero' to match 'float-cast-overflow'.
Also, this list was previously in alphabetical order :slight_smile:

@@ -2031,7 +2053,8 @@ Value
*ScalarExprEmitter::EmitOverflowCheckedBinOp(const BinOpInfo &Ops) {
   if (handlerName->empty()) {
     // If the signed-integer-overflow sanitizer is enabled, emit a call to its
     // runtime. Otherwise, this is a -ftrapv check, so just emit a trap.
- if (CGF.getLangOpts().SanitizeSignedIntegerOverflow)
+ if ((isSigned && CGF.getLangOpts().SanitizeSignedIntegerOverflow) ||
+ CGF.getLangOpts().SanitizeUnsignedIntegerOverflow)
       EmitBinOpCheck(Builder.CreateNot(overflow), Ops);
     else
       CGF.EmitTrapvCheck(Builder.CreateNot(overflow));

It still looks like this will take the EmitBinOpCheck path for
-fsanitize=unsigned-integer-overflow -ftrapv. I think:

  !isSigned || LangOpts.SanitizeSignedIntegerOverflow

... should work fine.

If Eli is happy, this LGTM with the above fixes.

Updated versions of the patches attached.

Thanks for the feedback, Richard! :slight_smile:

Eli, others: thoughts?

Thanks!

~Will

0001-Add-fsanitize-integer-for-reporting-suspicious-integ.patch (28.5 KB)

0001-ubsan-Support-unsigned-overflows-and-divide-by-zero-.patch (9.93 KB)

I have no opinion about the details of the patch, but I do have a few thoughts about the Integer Sanitizer overall.

First, please consider taking a look at the latest version of TS 17961 C Secure Coding Rules (latest version is here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1624.pdf ). This document establishes requirements for both analyzers and compilers regarding which violations need to be diagnosed. It's not just our (e.g., CERT's) opinion, but rather represents the current consensus of WG14. There will be a new version posted soon as a result of the Portland meeting. The portion of the document that is relevant to this patch is the minimum requirements for diagnosing signed overflow and unsigned wrapping; these requirements typically involve tainted inputs. You should probably consider addressing these rules as a minimum target.

CERT's postion is spelled out in rules:
* INT30-C. Ensure that unsigned integer operations do not wrap
* INT32-C. Ensure that operations on signed integers do not result in overflow
which are stricter than TS 17961. These rules would require additional diagnostics to be issues. You can find these rules in the Secure C coding standard book. Alternatively, feeding the rule ID (e.g., INT30-C) into your favorite search engine will get you to our wiki.

We note that there are good arguments for both positions. On the one hand, the subset of overflows diagnosed by TS 17961 is likely to contain a higher percentage of security vulnerabilities. On the other hand, *unintended* (and unanticipated) overflow or wrapping produces incorrect results. The additional diagnostics required by the CERT rules would discover some security vulnerabilities that the TS 17961 requirements would miss, a larger number of bugs that lead to incorrect results (but that may not lead to security issues), as well as a number of "false positives" where wrapping is either intentional or harmless.

My *personal* opinion is that getting integer wrapping semantics by default is a fundamental mistake in C-like languages in general. It's clear that programmers occasionally need modular (wrapping) semantics for math; uses like hash codes, crypto, and address arithmetic spring immediately to mind. Note, however, that all of these use cases involve *intentional* use of modular arithmetic.

Now, ask yourself the following question: When was the last time you wrote a program that would get the right answer in the event of *unintended* wrapping? I suggest that the answer is likely to be "never!"

Modular arithmetic must be available on request, but the *default* should never wrap. Instead, wrapping or overflow should be considered a bug. There're plenty of better choices than wrapping: you could saturate to maxint or minint; you could take a fault or exception; you could use some integer equivalent of NaN.
Any of these would be a better choice (in terms of semantics).

I realize that this isn't how C-like languages are defined, and that the language standards certainly aren't going to change. So integer sanitizers like this one are likely to be the best available alternative.

Dean F. Sutherland
dsutherland@cert.org

P.S. Back in the 90s, various commercial compiler vendors demonstrated that the total cost of runtime checking for integer overflow, array bounds violations, and nil-pointer dereferences *combined* can be driven below 10%. This requires optimization work specifically targeted at soundly removing unneeded checks (which consumes effort that otherwise might be spent on other optimizations), so it certainly isn't free. But it need not be prohibitively expensive for most code.

Eli: have your concerns been suitably addressed here?

Ping!

Updated patches attached (previous set conflicted with
-fsanitize=bounds change).

Thanks!

~Will

0001-Add-fsanitize-integer-for-reporting-suspicious-integ.patch (28.8 KB)

0001-ubsan-Support-unsigned-overflows-and-divide-by-zero-.patch (9.93 KB)

  • NeedsUbsanRt = Undefined ^ Bounds
  • NeedsUbsanRt = (Undefined ^ Bounds) | Integer

Would you mind changing this to “(Undefined & ~Bounds) | Integer” or similar, in passing? Otherwise, LGTM.

Sorry about the late reply. Yes, I think my concerns have been
addressed; thanks.

-Eli

Committed as r168700 and r168701, thanks for all the feedback and review! :slight_smile:

~Will