I’ve added a way to dynamically enable and disable the ASan runtime checks. Instrumentation that maintains shadow memory coherency still runs, but there is now a branch over load and store checking. In this way, ASan is now “dormant”, waiting until it should begin checking again. You can enable and disable dormant mode with a runtime call multiple times in the program. The inclusion of this feature is opt in with a compiler flag. This means you don’t get the extra overhead of a branch check if you don’t need dormant ASan.
Use Case
Sometimes you are running code for lengthy periods of time before reaching a crash/known bug. The overhead of ASan can be annoying during this time, as the checks won’t result in anything new found. This can especially be annoying in real time applications, as the user experience can become much worse. And you are already debugging- you don’t need another reason to be in a bad mood. With dormant ASan, you can pass -asan-dormant when you compile, and place a __asan_set_dormant(true) call before the code that experiences the crash is run. You now have much less performance overhead while the program progresses.
Example
I compiled FFmpeg without sanitizers, with ASan, and with dormant ASan. ASan took around 3.5 times longer to convert a video from MP4 to AVI compared to without sanitizers. With dormant ASan, it was closer to 1.8 times. This is quite a significant performance increase!
Implementation Details
I have a pull request draft here
Its currently a draft. There are perhaps some more areas I can add dormancy checks to, and I haven’t written any proper tests yet. There may also be places where comments could be added or variables renamed to make it easier to understand for a fresh pair of eyes. Please give me your thoughts on this matter!
My hypothesis is that if we assume that loads typically dominate, a “store only” mode might also be interesting and could in some cases also get close to the “dormant” mode in performance.
It would be feasible to split the single global into one for loads and one for stores. More functions could then be added to control them separately or together. It would be a bit messier of a change, as I currently don’t have to check what type of instruction is being instrumented before adding the code to skip past it. If its useful though, it may be worth adding. From [RFC] Overflow Idiom Exclusion, its clear that there is interest in adding more levers for users to play with.
Up to you. I think if it complicates things, I’d err on the side of keeping it simpler. There is also -mllvm -asan-instrument-reads=false which can probably achieve something like that with your current proposal, but obviously won’t allow to switch between modes arbitrarily.
The overhead is always less than running all the checks, though it fluctuated a bit on the tests. On larger programs like FFmpeg it seems to settle to around about 0.52 the overhead.
I realize I hadn’t used branch weights in my patch. By adding branch weights to make the dormant path more likely, the results are much better.
Once again, the tests run in dormant mode were left dormant for the entire runtime.
I’d like to see independent benchmarks. -fno-sanitize=address and __attribute__((no_sanitize("address"))) are indeed useful options.
I wonder whether “dormant” is accurate. This is more like activation/deactivation of memory load instrumentation.
Allocator and various __asan_* APIs are still working. We would unlikely allow this niche feature to add a condition to the runtime functions (overhead, even if minor).