[RFC] Adding a Dormant Mode to AddressSanitizer

Summary

I’ve added a way to dynamically enable and disable the ASan runtime checks. Instrumentation that maintains shadow memory coherency still runs, but there is now a branch over load and store checking. In this way, ASan is now “dormant”, waiting until it should begin checking again. You can enable and disable dormant mode with a runtime call multiple times in the program. The inclusion of this feature is opt in with a compiler flag. This means you don’t get the extra overhead of a branch check if you don’t need dormant ASan.

Use Case

Sometimes you are running code for lengthy periods of time before reaching a crash/known bug. The overhead of ASan can be annoying during this time, as the checks won’t result in anything new found. This can especially be annoying in real time applications, as the user experience can become much worse. And you are already debugging- you don’t need another reason to be in a bad mood. With dormant ASan, you can pass -asan-dormant when you compile, and place a __asan_set_dormant(true) call before the code that experiences the crash is run. You now have much less performance overhead while the program progresses.

Example

I compiled FFmpeg without sanitizers, with ASan, and with dormant ASan. ASan took around 3.5 times longer to convert a video from MP4 to AVI compared to without sanitizers. With dormant ASan, it was closer to 1.8 times. This is quite a significant performance increase!

Implementation Details

I have a pull request draft here
Its currently a draft. There are perhaps some more areas I can add dormancy checks to, and I haven’t written any proper tests yet. There may also be places where comments could be added or variables renamed to make it easier to understand for a fresh pair of eyes. Please give me your thoughts on this matter!

Nice idea!

How easy is it to make it more fine-grained:

  • Dormant
  • Check loads only
  • Check stores only
  • Check everything

My hypothesis is that if we assume that loads typically dominate, a “store only” mode might also be interesting and could in some cases also get close to the “dormant” mode in performance.

1 Like

Thanks!

It would be feasible to split the single global into one for loads and one for stores. More functions could then be added to control them separately or together. It would be a bit messier of a change, as I currently don’t have to check what type of instruction is being instrumented before adding the code to skip past it. If its useful though, it may be worth adding. From [RFC] Overflow Idiom Exclusion, its clear that there is interest in adding more levers for users to play with.

Up to you. I think if it complicates things, I’d err on the side of keeping it simpler. There is also -mllvm -asan-instrument-reads=false which can probably achieve something like that with your current proposal, but obviously won’t allow to switch between modes arbitrarily.

How easy is it to make it more fine-grained:

I’d rather apply YAGNI here.
We can always extend when it’s needed.

What is actual performance improvement?

Would it be possible to compare e.g. on test-suite, SingleSource, MultiSource sets?

I tested it on a few tests. When running Dormant, it was left dormant for the entire runtime. I never called the function to re-enable checks.

Test Path No sanitizer(s) ASan(s) Dormant ASan(s) Reduction in overhead
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test 1.51 1.79 1.62 0.393
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test 2.64 3.17 3.09 0.849
MultiSource/Applications/SPASS/SPASS.test 3.89 6.36 5.60 0.692

The overhead is always less than running all the checks, though it fluctuated a bit on the tests. On larger programs like FFmpeg it seems to settle to around about 0.52 the overhead.

I realize I hadn’t used branch weights in my patch. By adding branch weights to make the dormant path more likely, the results are much better.
Once again, the tests run in dormant mode were left dormant for the entire runtime.

Test Path No sanitizer(s) ASan(s) Dormant ASan(s) Reduction in overhead
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test 1.51 1.79 1.59 0.286
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test 2.64 3.17 2.87 0.434
MultiSource/Applications/SPASS/SPASS.test 3.89 6.36 4.75 0.275

Making the dormant path the likely path made it much faster.

Hm, those benchmarks looks weird, asan vs no-asan usually 3x or something.

As-is I can’t see myself making a custom build to speedup debug session by 20%.

I cases when I need to debug slow binary I just do -fno-sanitize=address for a part of the code, this gives close to opt performance.

I would prefer we don’t complicates stuff for this niche case.

@melver @fmayer @MaskRay @ramosian-glider WDYT?

I’d like to see independent benchmarks.
-fno-sanitize=address and __attribute__((no_sanitize("address"))) are indeed useful options.

I wonder whether “dormant” is accurate. This is more like activation/deactivation of memory load instrumentation.
Allocator and various __asan_* APIs are still working. We would unlikely allow this niche feature to add a condition to the runtime functions (overhead, even if minor).