BoundsChecking Pass

Hi,

I am a final year French student doing an internship at the University of Portsmouth. As I was taking hands on AddressSanitizer I took a look at BoundsChecking (both are in the lib/Transforms/Instrumentation folder).

I found nothing on it except for the LLVM Documentation and references to BaggyBoundsCheck (which is not the same project. As far as I understood it is part of the SAFECode project). Does anyone knows about it (BoundsChecking)? I have some inquiries I will try to explain just below…

I modified a bit the registration process of the Pass (the BoundsChecking one) to get the .so generated file once llvm rebuild. I then ran the LLVM opt with loading the .so for a C program that did both a stack and heap overflow:

  • clang -emit-llvm overflow.c -c -o overflow.bc

  • opt -load path-to-so/LLVMBoundsChecking.so -options < overflow.bc > overflow_instrumented.bc

I then ran llc and gcc to get an executable:

  • llc -filetype=obj overflow_instrumented.bc (generates a .o file with same name)

  • gcc overflow_instrumented.o -o overflow_instrumented

Once launched, the executable detects the stack access and crash the program (you can see the checks on the assembly code which are followed by a conditional jump on a UD2 instruction that basically crash a program) but nothing is instrumented for the heap access. On the BoundsChecking file it is said that run-time checks are maid but I don’t see them. So my questions are:

  • are there any heap checking made?

  • if yes, where are they?

I am interested in this because I think I am going to try to do the same work made on the stack to the heap.

Thank you for your help, any information or advice is welcome :slight_smile:

Pierre

Hi Pierre,

I'm the author of the BoundsChecking pass.
It's true there's little documentation about it (only mentioned in: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks). You can run it with 'clang -fsanitize=bounds' or 'opt -bounds-checking'.
The BoundsChecking pass, AddressSanitizer and BaggyBoundsCheck are all different code bases, each exploring a different set of tradeoffs. The goal of the BoundsChecking pass was that the runtime penalty should be low enough to enable usage in production.

Some information about the BoundsChecking pass:
- It is intra-procedural only. If you dereference a pointer that was passed as argument, then it is not checked (with some exceptions).
- It supports heap allocations, provided that these allocations are done using 1) standard functions that LLVM recognizes (malloc, new, strdup, etc) or 2) functions are annotated with alloc_size (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)
- It's helpful to compile with -O2, otherwise the pass will get confused very quickly. The design of the analysis assumes at least a few simplifications were done before.
- Sometimes LLVM transforms loops into intrinsics, like memcpy or memset. Right now these are not checked (but should, though)
- Guards are mostly not hoisted out of loops by LLVM; this needs improvement otherwise perf may suffer quite a bit.
- The analysis code is in lib/Analysis/MemoryBuiltins.cpp

Hope this helps. Please let us know if you have more questions.

Nuno

Hi Nuno,

It's true there's little documentation about it (only mentioned in: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#availablle-checks). You can run it with 'clang -fsanitize=bounds' or 'opt -bounds-checking'.
The BoundsChecking pass, AddressSanitizer and BaggyBoundsCheck are all different code bases, each exploring a different set of tradeoffs. The goal of the BoundsChecking pass was that the runtime penalty should be low enough to enable usage in production.

Some information about the BoundsChecking pass:
- It is intra-procedural only. If you dereference a pointer that was passed as argument, then it is not checked (with some exceptions).
- It supports heap allocations, provided that these allocations are done using 1) standard functions that LLVM recognizes (malloc, new, strdup, etc) or 2) functions are annotated with alloc_size (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)
- It's helpful to compile with -O2, otherwise the pass will get confused very quickly. The design of the analysis assumes at least a few simplifications were done before.

OK, I just compiled it with -O2 and the heapoverflow protection have been triggered. Though, I don't know what is the simplification required for the pass to run correctly?

Most optimizations/analyses expect SROA (mem2reg) to be run, otherwise the IR is too messy to analyze. InstCombine also does nice cleanups. These two are always a good idea to run, at least.

- Sometimes LLVM transforms loops into intrinsics, like memcpy or memset. Right now these are not checked (but should, though)
- Guards are mostly not hoisted out of loops by LLVM; this needs improvement otherwise perf may suffer quite a bit.

Are you still working on it? If yes, what is it that you are trying to do? I would like to work on this Pass during summer (until end of August). That would be great if you could lead me a little bit =)

I'm not actively working on it at the moment, but I'm still interested.
I can certainly provide guidance and review patches.

- The analysis code is in lib/Analysis/MemoryBuiltins.cpp

I have a question on this. As I read the code I was wondering how the run-time part was implemented. I was looking for something like a redefinition of malloc&free functions but I found no clue. Now I'm wondering if it's reduced to the run-time action of the ObjectSizeOffsetEvaluator class? This one is used to get the size&offset of the current array pointer.

No, malloc/free functions are not redefined. LLVM simply knows that 'malloc(x)' returns an object with size 'x' and offset 0. It then has to propagate this information all over (think of fat-pointers). An alternative would be as you say to replace malloc/free and this is in fact the approach taken by the two other passes you mentioned. BoundsChecking has no runtime; everything needed is inlined within the user's code.
ObjectSizeOffsetEvaluator builds expressions to give you the object size/offset at any given location. For example, if you have a loop iterating over a pointer variable, this class will create code that tracks how object size/offset evolves throughout the loop iterations.

Nuno