request for comments on patch: detecting integer undefined behaviors

Hi Clang folks,

Attached is my student Peng Li's patch to Clang for detecting integer
undefined behaviors. It will complain, for example, about C or C++ code
that evaluates any of:

  -INT_MIN
  INT_MAX+1
  2*INT_MAX
  >>-1
  x/0

These behaviors are undefined in all modern C/C++ variants.
Additionally, C99 and C++0x make lots of harmless-looking signed
left-shifts into undefined behavior. For example, 1<<31 is undefined
when sizeof(int)==4 because the result cannot be represented as a signed
32-bit quantity. Peng's patch has separate flags for the basic checks
and for the more aggressive C99/C++0x checks.

It also takes a flag for whether to use explicit checks or LLVM's
x.with.overflow intrinsics. The intrinsics (off by default) are faster
but not as well tested.

We'd like to get this integrated into Clang; please let us know if that
might happen and if so, what kind of changes we'll need to make.

Clang's generated code should not change at all if checking is not
requested.

Currently, we require a trap handler object file (source code also
attached) to be placed in /usr/local/lib or similar. We should move this
into LLVM somewhere, obviously, and would appreciate advice about
where/how to do that.

This checker has found problems in many applications including OpenSSL,
Perl, Python, PHP, BIND, and LLVM itself. My guess is that once a Clang
with this checker is released, it will generate a significant number of
new Clang users. We've already had a number of people ask for the tool.

John Regehr

trap_handler_onefile.c (17.6 KB)

clang-standalone-111903.patch (58.6 KB)

README (3.26 KB)

Generally, I think this is a good idea to integrate into clang;

Some review comments:

1. Do we really need more than one -fcatch-undefined-behavior flag?
It seems like we can key the style of checks off of the current
language selection, and I can't really see any reason for wanting some
checks, but not others.
2. The driver changes for adding -ltrapub to the link line look
strange, but I don't know the code well enough to properly review
them.
3. Trying to do overflow checks with -fwrapv seems strange: the whole
point of -fwrapv is that signed overflow becomes well-defined.
4. Please avoid strncpy if it isn't really necessary; use StringRef
and/or std::string instead.
5. The code for EmitDiv and EmitRem can be refactored together.
6. Please get rid of UndefinedBehaviorAddCheck,
UndefinedBehaviorSubCheck, and UndefinedBehaviorMulCheck; if there are
issues, we'd prefer to fix them in the LLVM backend (I'll try to fix
PR7838 sometime in the near future).
7. "llvm::ConstantInt::get(CGF.Int32Ty, INT32_MAX, true);" is ugly,
there are APInt methods to do the same thing for any bitwidth.
8. Is there a good reason for some operations to trap on finding
undefined behavior, but not others?
9. Bitcasting __ub_trap_handler depending on the check seems bad; it
should have a single, fixed type.
10. The addition of SourceLocation::getLocationValue is unacceptable;
there are already APIs to access all the information, and the API
isn't very good. In particular, using a char* as an out-parameter is
very unlike other LLVM code.

-Eli

Thanks Eli, we'll revise and resubnit!

John

Thanks again for the feedback Eli. Attached is an improved patch that addresses those comments, though see below.

Any chance we can sneak this in before the 2.8 branch?

2. The driver changes for adding -ltrapub to the link line look
strange, but I don't know the code well enough to properly review
them.

Peng doesn't know of a better way to accomplish the desired result. Can anyone close to this part of the code comment?

6. Please get rid of UndefinedBehaviorAddCheck,
UndefinedBehaviorSubCheck, and UndefinedBehaviorMulCheck; if there are
issues, we'd prefer to fix them in the LLVM backend (I'll try to fix
PR7838 sometime in the near future).

This is done. Our patched clang now crashes when compiling signed 64-bit multiplies on a 32-bit architecture, until that PR is fixed.

Also you asked for a single flag that requests undefined behavior checking, and then it keys off of the language that was chosen. This turns out to be a bad idea because Clang defaults to c99 mode, causing the aggressive shift checks to be added. Please just trust us that very few people want to see reports that for example 1<<31 is undefined! All real C codes contain lots of these "problems". In the attached patch, these checks are turned off by default for all languages, but can be enabled with -fshl-checks-extension for people who do want to try to be clean with respect to the c99/c++0x standards.

John

trap_handler_onefile.c (24.3 KB)

README (3.37 KB)

clang-standalone-112452.patch (38 KB)

Okay, that's fine.

A few more review comments:
+ if(CGF.rvOpTy->getScalarSizeInBits() == 32) {
+ IntMin = llvm::ConstantInt::get(VMContext, CGF.APInt32Min);
+ llvm::APInt APIntNegOne = llvm::APInt(32, -1, true);
+ NegOne = llvm::ConstantInt::get(VMContext, APIntNegOne);
+ }
+ else if(CGF.rvOpTy->getScalarSizeInBits() == 64) {
+ IntMin = llvm::ConstantInt::get(VMContext, CGF.APInt64Min);
+ llvm::APInt APIntNegOne = llvm::APInt(64, -1, true);
+ NegOne = llvm::ConstantInt::get(VMContext, APIntNegOne);
+ }

The if/else if shouldn't be necessary.

+ if(Ops.Ty->isFloatingType()) {
+ CGF.CreateTrapBB();
+ tmpRule = "Floating Division: Divisor is 0";
+ CGF.ATEI.setAll(Ops.LHS, Ops.RHS, Ops.E, DivFinalEnd, tmpRule);
+ CGF.Builder.CreateCondBr(Builder.CreateFCmpOEQ(Ops.RHS,
Zero), CGF.getSoleTrapBB(), DivFinalEnd);
+ CGF.EmitTrapBB();
+ }
+ else if(Ops.Ty->isRealFloatingType()) {
+ CGF.CreateTrapBB();
+ tmpRule = "Real Floating Division: Divisor is 0";
+ CGF.ATEI.setAll(Ops.LHS, Ops.RHS, Ops.E, DivFinalEnd, tmpRule);
+ CGF.Builder.CreateCondBr(Builder.CreateFCmpUEQ(Ops.RHS,
Zero), CGF.getSoleTrapBB(), DivFinalEnd);
+ CGF.EmitTrapBB();
+ }

The code in the else-if never runs: isFloatingType is a superset of
isRealFloatingType.

+ llvm::Value* trapHandlerResult;

Write-only variable?

+ // Converted operator's llvm::type
+ const llvm::Type* rvOpTy;

Is this really necessary? Either way, I have no idea what the name is
supposed to mean, and the comment should make it clear this is used
for -fcatch-undefined-behavior machinery.

Also, the indentation looks like it's off in a few places, and some
lines go over 80 columns.

-Eli

Another draft of this patch is attached!

John

clang-standalone-112808.patch (39.7 KB)

Other than that, I'm waiting for somone (probably Daniel) to review
the driver bits of the patch.

-Eli

Great! Thanks for proposing this. Here are some thoughts:

+static unsigned checkValueType(QualType opType, const llvm::Value *value) {
+ if(value == 0)
+ return 6;
+ else {
+ if(opType->isIntegerType()) {
+ if(opType->isSignedIntegerType()) {
+ if(value->getType()->isIntegerTy(32))
+ return 2;
+ else if(value->getType()->isIntegerTy(64))
+ return 3;
+ else
+ return 6;
+ }
+ else {
+ if(value->getType()->isIntegerTy(32))
+ return 0;
+ else if(value->getType()->isIntegerTy(64))
+ return 1;
+ else
+ return 6;
+ }

Please follow the llvm conventions. The indentation is consistent here, please put a space after if (and while and other control flow statements), and there is no need for an "else" after a returns (reducing indentation is good). These sorts of minor syntactic things occur throughout the patch.

Please also add doxygen comments for functions like this, it isn't clear what it is doing.

In terms of the actual shift check, there is no need to check for "x < 0", an unsigned comparison against the max value will encompass this.

In terms of the feature, I don't think that -fshl-checks-extension is a great name for the command line argument. If this were just about checking for undefined behavior around shift left, then this should be part of -ftrapv.

I'm not sure exactly what the code is doing yet, but it looks like the interesting thing here is that it provides the capability to *report* where a problem occurs, instead of just terminating the app as with -ftrapv. If this is the essential new thing, then it seems better to add a new level to the "default,-fwrapv,-ftrapv" hierarchy, maybe a new -fcheckv option. This option would do the same set of checks as -ftrapv, but would report (in a nice way with file/line/col + string info) the error before dying.

I think the right steps to get this work into the tree are:

1) Add the new checking for shift under the existing -ftrapv umbrella, and make it work with that system.
2) add a new -fcheckv option to the driver, make it start out as a synonym for -ftrapv, but do all the plumbing.
3) enhance -fcheckv to produce the useful information on failure, migrating one -ftrapv check at a time to take advantage of it.

Does this seem like a reasonable path forward, or did I miss the mark on what this is trying to do?

-Chris

Sounds like a good plan Chris.

Peng's patch improves on -ftrapv in three ways:

- implements more checks (shift stuff, divide by zero, etc.)

- implements some non-math undefined behavior checks (some inherited from LLVM's original -fcatch-undefined-behavior, some not)

- provides good diagnostics when a check fails

We'll proceed with trying to get this integrated in a piecewise fashion...

John

Excellent, thanks!

-Chris

Chris,

Our job of integrating Peng's patch with the -ftrapv functionality would be easier if we understood what is going to happen to LLVM's existing -fcatch-undefined-behavior flag. Does this flag have a future or will it be removed?

Thanks,

John

I don't see why it would be removed. I see it as "-ftrapv on steroids". Where -ftrapv is limited to integer overflow issues, -fcatch-undefined-behavior should catch other things like buffer overflows, use of uninit variables etc. -fcatch-undefined-behavior should imply -ftrapv, but I don't know if it does.

Does that help?

-Chris

Attached is a very small Clang patch that augments -ftrapv to check for divide by zero, mod by zero, and INT_MIN % -1. The behavior on failure is inherited from trapv.

Thanks,

John

clang-divrem-113592.patch (4.63 KB)

I was going to comment that this isn't the same behaviour as -ftrapv, but apparently it now is.

Chris, I note that you made this change last month, without any discussion, removing the original, generic, behaviour that was discussed on-list, in favour of GCC's more limited behaviour which does not provide options for recovery.

The clang -ftrapv implementation was used by us for implementing overflow checking and correction, and was to prototype CERT's as-if infinitely ranged integer model, which has been proposed for inclusion in C1X.

Perhaps you could be a bit clearer about why you originally agreed to this inclusion, and then later (once we had deployed code using it) decided to unilaterally modify it?

If you feel that the option should have precisely the same semantics as gcc's -ftrapv rather than a superset of that functionality, then you could commit the default handler (which I supplied with the original version of the code), which mimics gcc's functionality by calling abort from the handler.

I am very disappointed that you decided to reverse your original decision to commit this code without any discussion, just prior to a release.

David

Just to be clear:

All we want (at Utah) is undefined behavior checking and a way to invoke a handler with detailed information about the operation that failed, its srcloc, and why it failed. The command line flag or flags that control checking are irrelevant to us.

It would be excellent to also support a stripped-down trap handler that does not provide detailed information, supporting fast recovery from undefined operations.

John Regehr

this has nothing to do with John's patch, so I'm retitling the subject.

I was going to comment that this isn't the same behaviour as -ftrapv, but apparently it now is.

Yes, -ftrapv is a flag defined by gcc, clang follows the gcc behavior.

Chris, I note that you made this change last month, without any discussion, removing the original, generic, behaviour that was discussed on-list, in favour of GCC's more limited behaviour which does not provide options for recovery.

I thought that I emailed you or cc'd you on the commit. But yes you're right. -ftrapv is a gcc flag and we got numerous bug reports from people who were trying to use it and getting link errors.

The clang -ftrapv implementation was used by us for implementing overflow checking and correction, and was to prototype CERT's as-if infinitely ranged integer model, which has been proposed for inclusion in C1X.

Perhaps you could be a bit clearer about why you originally agreed to this inclusion, and then later (once we had deployed code using it) decided to unilaterally modify it?

I didn't realize that you were implementing it in a gcc compatible way.

If you feel that the option should have precisely the same semantics as gcc's -ftrapv rather than a superset of that functionality, then you could commit the default handler (which I supplied with the original version of the code), which mimics gcc's functionality by calling abort from the handler.

The problem is that it isn't a superset of the gcc behavior. The GCC behavior allows you to rebuild with a flag and find bugs. Your implementation requires you to implement a new function in your program.

-Chris

this has nothing to do with John's patch, so I'm retitling the subject.

I was going to comment that this isn't the same behaviour as -ftrapv, but apparently it now is.

Yes, -ftrapv is a flag defined by gcc, clang follows the gcc behavior.

Chris, I note that you made this change last month, without any discussion, removing the original, generic, behaviour that was discussed on-list, in favour of GCC's more limited behaviour which does not provide options for recovery.

I thought that I emailed you or cc'd you on the commit. But yes you're right. -ftrapv is a gcc flag and we got numerous bug reports from people who were trying to use it and getting link errors.

Numerous? I think I remember two. Anyway, that would have been a good reason for starting a discussion, not a good reason for removing a feature that has been in clang for over a year and that people are using.

The clang -ftrapv implementation was used by us for implementing overflow checking and correction, and was to prototype CERT's as-if infinitely ranged integer model, which has been proposed for inclusion in C1X.

Perhaps you could be a bit clearer about why you originally agreed to this inclusion, and then later (once we had deployed code using it) decided to unilaterally modify it?

I didn't realize that you were implementing it in a gcc compatible way.

I provided a default handler function that would call abort (after printing a helpful error) along with the original patch. It never got committed (I didn't have commit access back then, other people were committing my patches). Mike said it would go in a clang runtime lib when one existed.

If you feel that the option should have precisely the same semantics as gcc's -ftrapv rather than a superset of that functionality, then you could commit the default handler (which I supplied with the original version of the code), which mimics gcc's functionality by calling abort from the handler.

The problem is that it isn't a superset of the gcc behavior. The GCC behavior allows you to rebuild with a flag and find bugs. Your implementation requires you to implement a new function in your program.

A fix that would not have broken anything would have been to call the handler function if it is defined in the translation unit, or insert an abort if it isn't. That would have been relatively simple to implement and would have been compatible with both existing code that worked with clang and existing code that compiled with GCC.

Although, given that gcc's -ftrapv is documented as not working reliably or correctly, I'm not sure that compatibility with it is a particularly high priority.

David

Chris, I note that you made this change last month, without any discussion, removing the original, generic, behaviour that was discussed on-list, in favour of GCC's more limited behaviour which does not provide options for recovery.

I thought that I emailed you or cc'd you on the commit. But yes you're right. -ftrapv is a gcc flag and we got numerous bug reports from people who were trying to use it and getting link errors.

Numerous? I think I remember two. Anyway, that would have been a good reason for starting a discussion, not a good reason for removing a feature that has been in clang for over a year and that people are using.

I thought I emailed you about this. In any case, I don't think that there is any reasonable discussion to be had: it's the wrong thing to do for -ftrapv, as I said before, it breaks an established model. Adding a new option that does the new behavior may be reasonable.

I didn't realize that you were implementing it in a gcc compatible way.

I provided a default handler function that would call abort (after printing a helpful error) along with the original patch. It never got committed (I didn't have commit access back then, other people were committing my patches). Mike said it would go in a clang runtime lib when one existed.

Are you sure that I approved the patch and not mike? Do you have a link?

In any case, it doesn't matter if you patch included it or not, the end result is that the compiler was broken. It was reported in more than two places.

-Chris

Chris, I note that you made this change last month, without any discussion, removing the original, generic, behaviour that was discussed on-list, in favour of GCC's more limited behaviour which does not provide options for recovery.

I thought that I emailed you or cc'd you on the commit. But yes you're right. -ftrapv is a gcc flag and we got numerous bug reports from people who were trying to use it and getting link errors.

Numerous? I think I remember two. Anyway, that would have been a good reason for starting a discussion, not a good reason for removing a feature that has been in clang for over a year and that people are using.

I thought I emailed you about this.

Nope, the first I saw was when I read the patch just now and discovered that it had been removed. Checking the svn log, I discovered that you did it a month ago.

You even put in your commit message 'David is using this...' indicating that you knew that removing it would break existing code, but did not discuss it prior to making the change - even if you thought you emailed me, you clearly did not wait for a reply.

In any case, I don't think that there is any reasonable discussion to be had: it's the wrong thing to do for -ftrapv, as I said before, it breaks an established model.

What established model? Every time -ftrapv is mentioned on the GCC lists, the instructions are not to use it.

Adding a new option that does the new behavior may be reasonable.

Indeed, and the correct time for doing this was when refactoring the code. Now, since 2.8 has already branched, I am faced with another six months of telling people that they need to use clang from trunk instead of the release version.

If you had bothered to email me about this (and, when removing a feature that you know people use, the correct time to make the commit is after you have received a reply, not after you think you have sent an email), then I could have fixed it before the release was branched.

I didn't realize that you were implementing it in a gcc compatible way.

I provided a default handler function that would call abort (after printing a helpful error) along with the original patch. It never got committed (I didn't have commit access back then, other people were committing my patches). Mike said it would go in a clang runtime lib when one existed.

Are you sure that I approved the patch and not mike? Do you have a link?

Mike committed it. You reviewed it and asked that the -ftrapu option be removed. Check the mailing list archives for April 2, 2009. We also discussed it in IRC, where I explained both the implementation and my use of it in some detail (if you log private chats, check around April 2-4, 2009 - I don't keep logs).

In any case, it doesn't matter if you patch included it or not, the end result is that the compiler was broken. It was reported in more than two places.

The compiler is now broken for anyone who was using the -ftrapv functionality that clang has had for over a year, and which has been in two releases. Worse; while it is possible to work around the previous 'breakage' by simply linking in the file that I included with the original patch and mirroring the gcc's behaviour, it is not possible to work around this behaviour.

David

Thanks! I applied a slightly tweaked version of this in r113705.

The tweaks were to rename the function to EmitUndefinedBehaviorIntegerDivAndRemCheck, to make it clear that it only applied to integer div/rem.

I also changed this:

+ llvm::Value *Cond2 = Builder.CreateAnd(
+ Builder.CreateICmpEQ(Ops.LHS, IntMin),
+ Builder.CreateICmpEQ(Ops.RHS, NegOne), "and");

Because the order of evaluation of the calls isn't defined, so different compilers could emit the icmp's in different orders.

Finally, I changed isFloatingType() to isRealFloatingType(), because the code wasn't safe on _Complex floats.

Thanks again John,

-Chris