LLVM-based address sanity checker

Hello,

We’ve just released the first version of our LLVM-based address sanity checker: AddressSanitizer (http://code.google.com/p/address-sanitizer/).
The tool finds out-of-bound and use-after-free bugs (the subset of bugs detectable by Valgrind/Memcheck);
it consists of a LLVM compiler plugin which performs simple code instrumentation and a malloc replacement library.
The main advantage of the new tool is high speed: the slowdown is usually within 2x-2.5x.
Detailed description of the algorithm is found here: http://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm
The tool is young, but it already can run the Chromium browser (interactively!) and find bugs in it.

Would the LLVM community be interested in adopting this code into the LLVM trunk?
The instrumentation pass is ~350 LOC (http://code.google.com/p/address-sanitizer/source/browse/trunk/llvm/AddressSanitizer.cpp), but may grow over time as we add optimizations.
The run-time library (malloc replacement, http://code.google.com/p/address-sanitizer/source/browse/trunk/asan/asan_rtl.cc) is ~1500 LOC.

Thanks,

–kcc

Interesting. We’ve been working on our own open-source memory safety tool called SAFECode (). SAFECode has a number of nice features, including: o) It works on both Mac OS X and Linux (and should work on other Unix platforms with little/no modification). o) It has options for performing safety checks on loads/stores and on array/struct indexing operations. o) It can accurately detect array bounds violations on global and stack objects as well as heap objects (unlike Valgrind, which must use heuristics to identify global/stack object boundaries). o) It can optionally instrument C library functions to detect memory safety errors. o) It has both simple and sophisticated optimization passes that remove unneeded instrumentation. o) It has a debug instrumentation pass that can use LLVM debug metadata to enhance instrumentation to include source filename / line number info (although this code needs to be updated). o) It has features to make dangling pointer dereferences “safe” for production code. o) It has an experimental mode for precise dangling pointer detection. o) We have a new, faster memory object tracking run-time based on Baggy Bounds Checking. We’ve been doing a lot of refactoring work so that SAFECode can be used without DSA and Automatic Pool Allocation (APA) and could therefore be integrated into LLVM and Clang. Integrating them would make them easier to use (because one wouldn’t need to change Makefiles in large projects to generate bitcode files; just flip a switch on the Clang command-line) and would ease our maintenance burden (because there would be less out-of-tree code). An optional, out-of-tree libLTO library including DSA and APA could provide the remaining optimization functionality for those that want to use it. Would there be any objections to incorporating portions of SAFECode free of DSA/APA dependencies into mainline LLVM and adding a command-line option to Clang to tell it to instrument a program using SAFECode? – John T.

Hello again,

The tool we announced 1.5 months ago has matured quite a bit.
In addition to heap out-of-bound and use-after-free bugs it also finds stack overruns/underruns.
AddressSanitizer is being actively used by the Chromium developers and already found over 20 bugs: http://blog.chromium.org/2011/06/testing-chromium-addresssanitizer-fast.html

Question to the LLVM developers: would you consider adding the AddressSanitizer code to the LLVM trunk?

Thanks,

–kcc

    Would the LLVM community be interested in adopting this code into
    the LLVM trunk?

I cannot approve it, but I would love to have it :slight_smile:

    Thanks,

    --kcc

Cheers,
Rafael

Hello again,

The tool we announced 1.5 months ago has matured quite a bit.
In addition to heap out-of-bound and use-after-free bugs it also finds stack overruns/underruns.
AddressSanitizer is being actively used by the Chromium developers and already found over 20 bugs: http://blog.chromium.org/2011/06/testing-chromium-addresssanitizer-fast.html

Question to the LLVM developers: would you consider adding the AddressSanitizer code to the LLVM trunk?

Having functionality like this in mainline would be really interesting. I haven’t looked at your code yet, what are the major components, what impact does it have on the codebase?

-Chris

Do you have an idea how hard would it be to port to non-x86 platforms?
I saw some Intel ASM in the C++ file...

The run-time library being 1.5k loc is not encouraging, but it didn't
look particularly platform specific...

cheers,
--renato

Hello again,

The tool we announced 1.5 months ago has matured quite a bit.
In addition to heap out-of-bound and use-after-free bugs it also finds stack overruns/underruns.
AddressSanitizer is being actively used by the Chromium developers and already found over 20 bugs: http://blog.chromium.org/2011/06/testing-chromium-addresssanitizer-fast.html

Question to the LLVM developers: would you consider adding the AddressSanitizer code to the LLVM trunk?

Having functionality like this in mainline would be really interesting. I haven’t looked at your code yet, what are the major components, what impact does it have on the codebase?

LLVM:

This is my first code in LLVM, so it definitely needs cleanup to meet the LLVM guidelines.

Run time library (could be used with any other compiler):

Tests: http://code.google.com/p/address-sanitizer/source/browse/trunk/asan/asan_test.cc

–kcc

Question to the LLVM developers: would you consider adding
the AddressSanitizer code to the LLVM trunk?

Do you have an idea how hard would it be to port to non-x86 platforms?
I saw some Intel ASM in the C++ file…

Not hard at all.
At some point the file had no asm at all, but using the custom asm allows to make the generated code more compact.
Now, the code that actually reports the error is 5-6 bytes, we could decrease it to 1 byte (at least on x86/x86_64) with some more work.
http://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm#Report_Error
My first attempt that used no asm required ~15 bytes of code.
Note, this code is executed only once, so it affects the performance very slightly (through icache size).

The run-time library being 1.5k loc is not encouraging, but it didn’t
look particularly platform specific…

Alas. It will grow even more when we add MacOS support.
(currently, only tiny tests work on Mac).

–kcc

- Almost everything is on one
file: http://code.google.com/p/address-sanitizer/source/browse/trunk/asan/asan_rtl.cc

I'd split into separate C files, just for organisation purposes...
maybe for platform selection later, if needed.

Tests: http://code.google.com/p/address-sanitizer/source/browse/trunk/asan/asan_test.cc

You're using gtest, which is what LLVM uses for unit tests... that's a win. :wink:

cheers,
--renato

Hello again,

The tool we announced 1.5 months ago has matured quite a bit.
In addition to heap out-of-bound and use-after-free bugs it also finds stack overruns/underruns.
AddressSanitizer is being actively used by the Chromium developers and already found over 20 bugs: http://blog.chromium.org/2011/06/testing-chromium-addresssanitizer-fast.html

Question to the LLVM developers: would you consider adding the AddressSanitizer code to the LLVM trunk?

Having functionality like this in mainline would be really interesting. I haven’t looked at your code yet, what are the major components, what impact does it have on the codebase?

LLVM:

This is my first code in LLVM, so it definitely needs cleanup to meet the LLVM guidelines.

Run time library (could be used with any other compiler):

Tests: http://code.google.com/p/address-sanitizer/source/browse/trunk/asan/asan_test.cc

One more part is the symbolizer – when reporting an error we need to provide function name, file name and line number of every PC in the stack.
There are two options: offline symbolizer (simple python script which uses addr2line) and in-process symbolizer based on libbfd.
Due to the huge size of dwarf generated by llvm (suspected http://llvm.org/bugs/show_bug.cgi?id=7554) both options are terribly slow – up to 1 minute and 7G RAM on chromium per one report.
I don’t like either option, please recommend if LLVM has another one.

Not hard at all.
At some point the file had no asm at all, but using the custom asm allows to
make the generated code more compact.

That should also be simple for other platforms...

My first attempt that used no asm required ~15 bytes of code.
Note, this code is executed only once, so it affects the performance very
slightly (through icache size).

I see, maybe you could leave your C implementation as a fall back.

Alas. It will grow even more when we add MacOS support.
(currently, only tiny tests work on Mac).

I saw a few APPLE macros. I think that's the part that needs most
care, since multi-platform (os/arch/etc) could bloat that
considerably.

cheers,
--renato

Not hard at all.
At some point the file had no asm at all, but using the custom asm allows to
make the generated code more compact.

That should also be simple for other platforms…

Sure.

My first attempt that used no asm required ~15 bytes of code.
Note, this code is executed only once, so it affects the performance very
slightly (through icache size).

I see, maybe you could leave your C implementation as a fall back.

Not easy, because it will require a fallback code in the run time library.
But yes, possible.

Alas. It will grow even more when we add MacOS support.
(currently, only tiny tests work on Mac).

I saw a few APPLE macros.

Yea…

I think that’s the part that needs most
care, since multi-platform (os/arch/etc) could bloat that
considerably.

Agree. We’ve been adding APPLE code just recently. Time to split.
I afraid we’ll have to keep a single .cc file and add .h files for os/arch specific code to keep inlining under manual control (I don’t want to reply on the compiler doing cross-cc-file inlining for me).

–kcc

I see, maybe you could leave your C implementation as a fall back.

Not easy, because it will require a fallback code in the run time library.
But yes, possible.

I was thinking more of a build-time fall back, when you're choosing
the platform...

#ifdef __x86_arch
... specific
#else
C-generic
#endif

Agree. We've been adding APPLE code just recently. Time to split.
I afraid we'll have to keep a single .cc file and add .h files for os/arch
specific code to keep inlining under manual control (I don't want to reply
on the compiler doing cross-cc-file inlining for me).

You might end up with a lot of redundancy.

If inlining is not working as you'd expect, a compile-time static
include (is ugly, but) would be better than to have the same code over
and over.

cheers,
--renato

I see, maybe you could leave your C implementation as a fall back.

Not easy, because it will require a fallback code in the run time library.
But yes, possible.

I was thinking more of a build-time fall back, when you’re choosing
the platform…

#ifdef __x86_arch
… specific
#else
C-generic
#endif

Sure, this is possible. But we’ll need two such blocks: one in the LLVM instrumentation and one (matching) in the run-time.
I am rather reluctant to add ‘generic’ code that handles unknown/untested platforms because the memory mapping is very platform specific anyway.
For extreme performance ASAN uses the fixed memory-to-shadow mapping (addr>>3)+offset. I guess there are plenty of platforms where this mapping won’t work or will require different offset.

–kcc

I am rather reluctant to add 'generic' code that handles unknown/untested
platforms because the memory mapping is very platform specific anyway.

Indeed, but the point of that is more for helping writing
platform-specific versions than actually using it as a general-purpose
routine. Kinda documentation of your intentions on C, which is better
than ASM.

If you're lucky, it might even work... :wink:

For extreme performance ASAN uses the fixed memory-to-shadow mapping
(addr>>3)+offset. I guess there are plenty of platforms where this mapping
won't work or will require different offset.

I got a bit confused there, TBH. How do you guarantee that that part
of memory will be allocated? If I'm not mistaken, the Native Client
guys have done a memory place-holder, with enough space pre- and
post-code, is it similar of what you're doing? Or are you using a big
BSS region?

Depending on how you did it, it might just work on other platforms...

cheers,
--renato

I am rather reluctant to add ‘generic’ code that handles unknown/untested
platforms because the memory mapping is very platform specific anyway.

Indeed, but the point of that is more for helping writing
platform-specific versions than actually using it as a general-purpose
routine. Kinda documentation of your intentions on C, which is better
than ASM.

If you’re lucky, it might even work… :wink:

Maybe the fallback code should just use a function call. Much simpler for documentation purposes.

For extreme performance ASAN uses the fixed memory-to-shadow mapping
(addr>>3)+offset. I guess there are plenty of platforms where this mapping
won’t work or will require different offset.

I got a bit confused there, TBH. How do you guarantee that that part
of memory will be allocated?

On 32-bit, the shadow region is:

[0x28000000, 0x3fffffff] HighShadow
[0x24000000, 0x27ffffff] ShadowGap
[0x20000000, 0x23ffffff] LowShadow

This is 0.5G total. So, I mmap all these three shadow subregions and ‘mprotect’ the ShadowGap.
This is done at startup. If the mmap fails, an assert will fire.

On 64-bit, the shadow looks like this:

[0x0000140000000000, 0x00001fffffffffff] HighShadow
[0x0000120000000000, 0x000013ffffffffff] ShadowGap
[0x0000100000000000, 0x000011ffffffffff] LowShadow

This is quite a lot, I can not mmap/mprotect this thing.
So, I basically hope that it won’t be used by anyone but the ASAN run time (of course, there are asserts here and there to check it).
When some part of the shadow region is being written to (when we poison memory), SEGV happens and the SEGV handler mmaps the required region.

If I’m not mistaken, the Native Client
guys have done a memory place-holder, with enough space pre- and
post-code, is it similar of what you’re doing?

Not very similar. 64-bit NaCl mmaps 88G of address space. On 64-bit ASAN I need 16384G of RAM, which is a bit too much to mmap.

Or are you using a big
BSS region?

Depending on how you did it, it might just work on other platforms…

cheers,
–renato

–kcc

Maybe the fallback code should just use a function call. Much simpler for documentation purposes.

Sounds good.

On 32-bit, the shadow region is:

[0x28000000, 0x3fffffff] HighShadow
[0x24000000, 0x27ffffff] ShadowGap
[0x20000000, 0x23ffffff] LowShadow

This is 0.5G total. So, I mmap all these three shadow subregions and ‘mprotect’ the ShadowGap.
This is done at startup. If the mmap fails, an assert will fire.

I see. On embedded platforms that won’t work with all cases. Most implementations have fragmented memory, memory mapped I/O, secure zones, etc. Depending on what you’re trying to do, mmap will work but writing to memory won’t.

On ARM world, SoC designers can come up with any number of configurations, which makes a generic implementation impossible. You’ll need some kind of tablegen to define writeable regions and how to map between memory and shadow. Manufacturers generally provide this information when you buy the kit.

But again, most OSes should take care of that for you, so that’s only relevant for bare-metal applications.

On 64-bit, the shadow looks like this:

[0x0000140000000000, 0x00001fffffffffff] HighShadow
[0x0000120000000000, 0x000013ffffffffff] ShadowGap
[0x0000100000000000, 0x000011ffffffffff] LowShadow

This is quite a lot, I can not mmap/mprotect this thing.
So, I basically hope that it won’t be used by anyone but the ASAN run time (of course, there are asserts here and there to check it).
When some part of the shadow region is being written to (when we poison memory), SEGV happens and the SEGV handler mmaps the required region.

Ok, if you allocate big enough regions you shouldn’t need to SEGV that often.

Maybe the fallback code should just use a function call. Much simpler for documentation purposes.

Sounds good.

On 32-bit, the shadow region is:

[0x28000000, 0x3fffffff] HighShadow
[0x24000000, 0x27ffffff] ShadowGap
[0x20000000, 0x23ffffff] LowShadow

This is 0.5G total. So, I mmap all these three shadow subregions and ‘mprotect’ the ShadowGap.
This is done at startup. If the mmap fails, an assert will fire.

I see. On embedded platforms that won’t work with all cases. Most implementations have fragmented memory, memory mapped I/O, secure zones, etc. Depending on what you’re trying to do, mmap will work but writing to memory won’t.

On ARM world, SoC designers can come up with any number of configurations, which makes a generic implementation impossible. You’ll need some kind of tablegen to define writeable regions and how to map between memory and shadow. Manufacturers generally provide this information when you buy the kit.

But again, most OSes should take care of that for you, so that’s only relevant for bare-metal applications.

On 64-bit, the shadow looks like this:

[0x0000140000000000, 0x00001fffffffffff] HighShadow
[0x0000120000000000, 0x000013ffffffffff] ShadowGap
[0x0000100000000000, 0x000011ffffffffff] LowShadow

This is quite a lot, I can not mmap/mprotect this thing.
So, I basically hope that it won’t be used by anyone but the ASAN run time (of course, there are asserts here and there to check it).
When some part of the shadow region is being written to (when we poison memory), SEGV happens and the SEGV handler mmaps the required region.

Ok, if you allocate big enough regions you shouldn’t need to SEGV that often.

The SEGV handler mmaps large aligned chunks, currently 1M each. (1M of shadow corresponds to 8M of application memory).

–kcc

Maybe the fallback code should just use a function call. Much simpler for documentation purposes.

Sounds good.

I implemented the asm-free way to report warnings as an option to the llvm instrumentation pass (uses a call to run-time).
It generates more code, it also creates prologue/epilogue in otherwise leaf functions.
Such mode may still be useful if for whatever reason we can not use SIGILL.

Default (use ud2):

402ed5: 48 89 d8 mov %rbx,%rax << move the address to rax
402ed8: 0f 0b ud2a << crash
402eda: 52 push %rdx << encode is_write and size in the opcode

(note: with a good disassembler and some work we can leave just ud2 or equivalent)

-mllvm -asan-use-call

402ed5: 48 89 df mov %rbx,%rdi << address is the paremeter to __asan_report_error_2
402ed8: e8 53 69 00 00 callq 409830 <__asan_report_error_2> << is_write and size is encoded in the function name

–kcc

Hi,
What would be our next steps in getting ASan into the LLVM trunk?I’d like to do it in two steps, first for the LLVM part with minimal tests and then for the run-time library and all tests.
The current ASan’s source repository will probably stay the primary home for the run-time library and tests as we plan to use it in non-LLVM environments.

Thanks,

–kcc