PredicateSimplifier questions

PredicateSimplifier is a pretty interesting pass, but it doesn't look like opt invokes it at any standard -Ox level, and so I assume that llvm-gcc also does not use this pass? If that is right, I'm curious about why this is the case -- does it simply not provide enough code speedup to compensate for the increase in compile time?

Also, a colleague and I (we both teach advanced compiler courses) would like to have a relatively easy way for students to try out various abstract interpretation techniques on LLVM code. Writing an abstract interpreter from scratch has too much overhead. Of the LLVM passes that we know of, PredicateSimplifier seems the best starting point for making a more generic abstract interpreter into which collections of transfer functions could be plugged. Does that seem right or can anyone suggest a better starting point? It would be pretty cool, for example, to see what the Octagon domain could learn about LLVM programs.

John Regehr

Hi John,

John Regehr wrote:

PredicateSimplifier is a pretty interesting pass, but it doesn't look like opt invokes it at any standard -Ox level, and so I assume that llvm-gcc also does not use this pass? If that is right, I'm curious about why this is the case -- does it simply not provide enough code speedup to compensate for the increase in compile time?

I wrote predsimplify as I was learning about compiler theory. It's pretty dumb in the sense that it spends lots of time analysing things that will never be used, and despite being the slowest pass in LLVM I haven't seen it improve run-times in a nightly test.

Also, a colleague and I (we both teach advanced compiler courses) would like to have a relatively easy way for students to try out various abstract interpretation techniques on LLVM code. Writing an abstract interpreter from scratch has too much overhead. Of the LLVM passes that we know of, PredicateSimplifier seems the best starting point for making a more generic abstract interpreter into which collections of transfer functions could be plugged. Does that seem right or can anyone suggest a better starting point? It would be pretty cool, for example, to see what the Octagon domain could learn about LLVM programs.

Predsimplify is believed to have bugs (it results in miscompiled programs) and certainly isn't efficient (it was written before much of include/ADT). Finally, predsimplify is likely to go away once I or someone else writes a proper VRP pass.

The only other starting points I can suggest are the concrete interpreter in lib/ExecutionEngine/Interpreter and the sparse conditional propagation framework in Analysis/SparsePropagation.h.

If you decide that predsimplify is useful for you despite my warnings, then I would be very happy to hear that my time working on it wasn't all wasted!

Nick Lewycky

Thanks for the answers Nick!

I wrote predsimplify as I was learning about compiler theory. It's
pretty dumb in the sense that it spends lots of time analysing things
that will never be used, and despite being the slowest pass in LLVM I
haven't seen it improve run-times in a nightly test.

Interesting. The application I have in mind here is some TinyOS code where we've added lots of array bounds checks. The vast majority of these in fact cannot fail but LLVM and GCC are not smart enough to see this. It seemed like predsimplify was probably the right pass for this, but I haven't studied its success rate yet.

The problem isn't execution cost but rather code size. This is for the msp430 platform which has too little flash memory. The code bloat really is a showstopper here: a number of existing applications overflow the available memory when we add checking, and cannot be run! (This is using gcc, we do not yet have an LLVM port to msp430.)

So the "pluggable domain" thing (an idea we had good luck with at the source level using CIL) is not just for compiler class, but also for solving this real problem. In addition to calling out to standard abstract domains I want to for example play with shelling out to a heavyweight decision procedure. This should make many spurious bounds checks go away. Greg Morrisett reports good success with this approach in the Cyclone compiler. Inspection of embedded codes shows that most array accesses can be proved to be in bounds pretty easily-- though these proofs are apparently out of reach of -O2 type optimizers.

Predsimplify is believed to have bugs (it results in miscompiled
programs) and certainly isn't efficient (it was written before much of
include/ADT). Finally, predsimplify is likely to go away once I or
someone else writes a proper VRP pass.

Well I do have a good way to find integer miscompilations... but don't want to waste everyone's time if this pass is on the way out.

John Regehr

Predsimplify is believed to have bugs (it results in miscompiled
programs) and certainly isn't efficient (it was written before much of
include/ADT). Finally, predsimplify is likely to go away once I or
someone else writes a proper VRP pass.

Whoever does this, I strongly encourage looking into using (or at least providing optional support for) the Apron library:

   http://apron.cri.ensmp.fr/library/

No sense reinventing these wheels.

John Regehr

In my experience, starting with a very simple and very cheap approach will get most of the benefit. For those who really want to eliminate every check possible, a more expensive approach can be used on top of it. If someone is interested in array bound check elimination, I'd suggest starting with the ABCD (array bounds check elimination on deman) paper. I believe that vmkit would hugely benefit from this as well.

-Chris

Hi Chris,

In my experience, starting with a very simple and very cheap approach
will get most of the benefit. For those who really want to eliminate
every check possible, a more expensive approach can be used on top of
it. If someone is interested in array bound check elimination, I'd
suggest starting with the ABCD (array bounds check elimination on
deman) paper. I believe that vmkit would hugely benefit from this as
well.

Ada would also benefit I think: all array accesses are checked.

Ciao,

Duncan.

Chris do you have a sense for how the definedness of signed overflow in LLVM would play out in the context of bounds check elimination? That is, would it cause lots of failure to eliminate checks that could be seen to be unnecessary at the C level?

John

That is an interesting question, and there are several related issues. The possibility of undefined behavior or behavior that programmer's don't expect in C code often leads to "security checks" that end up not doing anything. For example, see things like:

http://www.kb.cert.org/vuls/id/162289
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537
etc.

In general, I think that if you *really* care about code security in C that bounds checks are not enough. You need to use various techniques to reduce the amount of undefined behavior in C, such as compiling with -fwrapv.

A project that I'd like to tackle eventually in Clang is to have direct support by this by emitting code that zero initializes variables by default, *automatically* inserts bound checks where it can, inserts code to check that shift amounts are in range, etc.

-Chris