Runtime interception: design problem

Hi everyone,

I am having troubles but this shouldn’t be hard to solve for many people here. I am beginning a runtime feature for the BoundsChecking pass and I want to replace the libc malloc&free. I followed the design of AddressSanitizer (Asan) and tried to use the INTERCEPTOR macro from the interception.h file of compiler-rt library.

Here is the problem. The file I modify (BoundsCheking.cpp) is in lib/Transforms/Instrumentation/ and I can’t include properly interception.h (which is in projects/compiler-rt/lib/interception/). I looked at the CMakeLists.txt and how other files included interception.h but they are all from compiler-rt lib directory.

I assume this is normal as runtime project should be developped under the correct directory but I don’t see how this should be designed in order that the code optimized by BoundsCheking Pass uses my own malloc&free functions.

I spent quite some time on Asan runtime code and found that runtime could be initialized with a call to __asan_init() (function defined in asan_rtl.cc) from the instrumented code directly but I don’t know if it’s the only way to do it or how to reproduce it…

So to resume, I want to make my own malloc and free function to be called by the code I instrument with BoundsChecking. Any suggestion is welcome =)

Thanks,

Pierre

Dear Pierre,

Stepping up a level, what is your goal in replacing calls to malloc() and free()? Is it any different than what SAFECode, SoftBound, or ASan do?

Regards,

John Criswell

Hi John,

Something does not make sense to me here: lib/Transforms/... is about stuff that will transform/generate code, it does not contain code that will be part of the final binary. So this transform may generates calls to your runtime, but should not need the runtime to operate.

I wouldn’t all them “wild,” but yes, object-referent approaches (Jones-Kelley, Ruwase-Lam, SAFECode, WIT) will allow bound overflows in arrays embedded within structures so long as they don’t leave the memory object. That said, the original SAFECode (with automatic pool allocation) had provably sound operational semantics (see Dinakar Dhurjati’s PLDI 2006 paper). One of the things that makes memory safety in C so expensive is that a) pointers can point into the middle of objects and b) the original C design doesn’t make it easy to take a pointer and determine into which object it points. This leads to the two primary memory safety designs: referent objects (which tracks metadata separately from pointers) and FAT pointers (which expand the pointer representation so that each pointer contains both base and bounds information to its referent object (or sub-object)). Each one has advantages or disadvantages. Nuno’s BoundsChecking trades completeness for speed; it only does a bounds check when the bounds information can be easily found though local analysis (e.g., the malloc() and pointer arithmetic operation occur within the same function). Once you go inter-procedural, things get messy as you may not have the bounds information for a pointer readily available in the function doing the memory access. That’s when you start either having to look up metadata about a pointer’s referent object (SAFECode) or transforming the program to pass bounds information on pointers (SoftBound). Either way, you hit the performance problems and have to deal with them. My overall impression is that high speed memory safety (sans new hardware support) won’t be really fast without strong analysis. The original SAFECode got its speed using points-to analysis, type-inference, and the automatic pool allocation transformation and could elide load/store checks. The challenge, IMHO, is making these analyses simple enough to be maintainable or useful enough for optimization that the maintenance burden can be spread among a larger group of interested people. The memory safety menagerie () contains a list of papers on memory safety attacks and defenses. As I left Illinois about two years ago, I’ve let its maintenance slide, so there are some papers (notably the “Eternal War in Memory” paper) that aren’t listed. If I get time this summer, I’ll try to get that web site updated. Regards, John Criswell

>
> Hi everyone,
>
> I am having troubles but this shouldn't be hard to solve for many people
here. I am beginning a runtime feature for the BoundsChecking pass and I
want to replace the libc malloc&free. I followed the design of
AddressSanitizer (Asan) and tried to use the INTERCEPTOR macro from the
interception.h file of compiler-rt library.
>
> Here is the problem. The file I modify (BoundsCheking.cpp) is in
lib/Transforms/Instrumentation/ and I can't include properly interception.h
(which is in projects/compiler-rt/lib/interception/).

Something does not make sense to me here: lib/Transforms/... is about
stuff that will transform/generate code, it does not contain code that will
be part of the final binary. So this transform may generates calls to your
runtime, but should not need the runtime to operate.

Yes I know this does not make sense. That's why I asked for some help to
clear the design (in my head and also in the code).

As I said further in the message, when I analyzed ASan I found the runtime
could be called by instrumented code through the function __asan_init() but
I haven't managed to reproduce the mechanism. Though I haven't spend much
time on it as I thought maybe another solution existed or someone could
explain it to me easily.

But as you said it could be the regular way to do it I looked a bit deeper
and it appears to be this pattern:
  - the doInitialization of the FunctionPass (which as more access than the
runOnFunction) calls ModuleUtils createSanitizerCtorAndInitFunctions
function to get two pointers (constructor_func, init_func)
  - stores it in the FunctionPass structure
  - runOnFunction may then call init_func via the IRBuilder when needed.
Am I right? I'll try to use it the same way.

--
Mehdi

Thank you Mehdi!
Pierre