proposal for exploiting undefined behavior much more aggressively

http://blog.regehr.org/archives/761

Thanks,

John

It's an interesting post, but I'd like to point out that it is a non-goal for the project to be actively hostile to users of the compiler. :slight_smile: It is useful to have debugging tools for people who really care, but "exploiting" undefined behavior just for the sake of breaking code is a non-goal.

A specific example is code like this (which is quite common):

int ftoi(float F) {
  return *(int*)&F;
}

This is a violation of the C spec, due to type-based aliasing issues (the right approach is to use a union). That said, we go out of our way to not break this sort of idiom, because it is obvious to the compiler and actively hostile to a widely used pattern in dusty deck code.

-Chris

I'd like to point out that it is a non-goal for the project to be actively hostile to users of the compiler. :slight_smile:

I like that in a compiler :slight_smile:

John

Also, as a bit of a teaser, I think Richard Smith is going to be looking
into providing much more extensive optional *checking* of undefined
behavior. Essentially, '-fcatch-undefined-behavior' may grow some serious
teeth. This should allow users to check their code fairly aggressively for
UB but not needlessly punish code or those debugging code.

27.07.12, 03:30, "Chris Lattner" <clattner@apple.com>:

I think we're approaching this a bit backwards.

We should first see if there is some non-trivial performance gain to be had
by leveraging the undefined behavior. If so, we should then evaluate
whether it is reasonable to hope for real world programmers to avoid such
undefined behavior, and whether the code we have today is reasonably free
of it.

If all three of these hold true (the first is actually the one i find least
likely to be true in many cases), only then should we implement an
optimization which leverages the UB, and we should first make some attempt
to implement reasonable warnings and runtime checking of the UB to help
users who run afoul of the optimization.

I think[1] that the primary issue is that it should never be the *goal* to
exploit undefined behavior. The goal should be faster generated code,
smaller generated code, or some other valuable thing for a compiler. Then,
if the undefined behavior gives a particular opportunity to reach that
goal, we should consider taking that opportunity. To simply willfully
transform code with undefined behavior code into ludicrous constructs is to
put the cart before the horse.

[1]: Of course, perhaps Chris is thinking something else. ;] This is just
my two cents.

-Chandler

Surely code that contains no undefined behaviour will gain no benefits from optimisations that exploit undefined behaviour?

David

Free for a particular dynamic execution. Thus the undefined behavior which
is being used to optimized does not, dynamically, occur in the executions
of the program we are considering.

This is the classical place where such optimizations are useful. Some
external constraint prevents execution of the code dynamically, and we
would like to allow the compiler to optimize under such an assumption
statically due to the presence of UB.

Relying undefined behavior, the compiler can infer conditions which must necessarily hold, and use this information in subsequent optimizations. For instance, adding positive values to a signed integer which is zero initially will never yield a negative value. If this value is later fed to code which can cope with negatives values through an explicit check, that check can be optimized away. So obviously, you need some code reuse/abstraction (calling a more general routine from very specialized code) to trigger optimizations, but there are hypothetical wins even for programs which never actually trigger undefined behavior at run time.

However, I would prefer if programmers could provide the compiler with the necessary information in more obvious ways. This way, similar properties could be specified for values of unsigned integer type as well.

Yep, I completely agree. If we could do everything we wanted purely analytically, it would be best to *not* depend on UB ever. Unfortunately, there are some things that UB makes possible that are otherwise infeasible, and these *are* generally useful for most well behaved programs (type based alias analysis is one example).

-Chris

Also, as a bit of a teaser, I think Richard Smith is going to be looking into
providing much more extensive optional *checking* of undefined behavior.
Essentially, '-fcatch-undefined-behavior' may grow some serious teeth. This
should allow users to check their code fairly aggressively for UB but not
needlessly punish code or those debugging code.

Very cool. I'd like to hear more.

BTW we're still working on getting IOC into shape for inclusion.

John

However, I would prefer if programmers could provide the compiler with
the necessary information in more obvious ways. This way, similar
properties could be specified for values of unsigned integer type as well.

This might be able to take the form of something as simple as a smart
`assert`. It would be fantastic if you could do "assert(x > 0)", and
then have that information passed down into the IR so that the
compiler can exploit it (ifdef NDEBUG, of course).

Of course, the compiler probably couldn't exploit assertions like
`DFSForest.hasNoBackEdges()`. However, I think there could be a wide
range of assertions where this could help the compiler. For example,
it would be a good way to explicitly provide information about pointer
aliasing in a more fine-grained way than `restrict`.

--Sean Silva

There’s actually one possibly-interesting use-case where it’s reasonable to have “exploiting undefined behavior” as a goal. Maximally-perverse results from code that depends on undefined behavior can be useful for users who are attempting to stamp-out undefined behavior from their code. This is only desirable, of course, when it’s been explicitly requested by the user.

You can find an example of this in the GNAT Ada front end. Finding a safe elaboration order for static variables is undecidable, so Ada compilers use pretty-good heuristics to address the problem. Smart programmers also test their code in a mode where the compiler reverses its heuristic and pessimizes the elaboration order. When your code works both ways, you’re in pretty good shape (on that issue, at least). One could imagine similar treatment for UB.

I rather expect that this use case isn’t particularly interesting for CLANG.

Dean F. Sutherland