[RFC] Adding a Dormant Mode to AddressSanitizer

gbMattN · July 22, 2024, 10:15am

Summary

I’ve added a way to dynamically enable and disable the ASan runtime checks. Instrumentation that maintains shadow memory coherency still runs, but there is now a branch over load and store checking. In this way, ASan is now “dormant”, waiting until it should begin checking again. You can enable and disable dormant mode with a runtime call multiple times in the program. The inclusion of this feature is opt in with a compiler flag. This means you don’t get the extra overhead of a branch check if you don’t need dormant ASan.

Use Case

Sometimes you are running code for lengthy periods of time before reaching a crash/known bug. The overhead of ASan can be annoying during this time, as the checks won’t result in anything new found. This can especially be annoying in real time applications, as the user experience can become much worse. And you are already debugging- you don’t need another reason to be in a bad mood. With dormant ASan, you can pass -asan-dormant when you compile, and place a __asan_set_dormant(true) call before the code that experiences the crash is run. You now have much less performance overhead while the program progresses.

Example

I compiled FFmpeg without sanitizers, with ASan, and with dormant ASan. ASan took around 3.5 times longer to convert a video from MP4 to AVI compared to without sanitizers. With dormant ASan, it was closer to 1.8 times. This is quite a significant performance increase!

Implementation Details

I have a pull request draft here
Its currently a draft. There are perhaps some more areas I can add dormancy checks to, and I haven’t written any proper tests yet. There may also be places where comments could be added or variables renamed to make it easier to understand for a fresh pair of eyes. Please give me your thoughts on this matter!

melver · July 22, 2024, 1:00pm

Nice idea!

How easy is it to make it more fine-grained:

Dormant
Check loads only
Check stores only
Check everything

My hypothesis is that if we assume that loads typically dominate, a “store only” mode might also be interesting and could in some cases also get close to the “dormant” mode in performance.

gbMattN · July 22, 2024, 3:24pm

Thanks!

It would be feasible to split the single global into one for loads and one for stores. More functions could then be added to control them separately or together. It would be a bit messier of a change, as I currently don’t have to check what type of instruction is being instrumented before adding the code to skip past it. If its useful though, it may be worth adding. From [RFC] Overflow Idiom Exclusion, its clear that there is interest in adding more levers for users to play with.

melver · July 22, 2024, 4:14pm

Up to you. I think if it complicates things, I’d err on the side of keeping it simpler. There is also -mllvm -asan-instrument-reads=false which can probably achieve something like that with your current proposal, but obviously won’t allow to switch between modes arbitrarily.

vitalybuka · July 22, 2024, 7:42pm

How easy is it to make it more fine-grained:

I’d rather apply YAGNI here.
We can always extend when it’s needed.

vitalybuka · July 22, 2024, 7:48pm

What is actual performance improvement?

Would it be possible to compare e.g. on test-suite, SingleSource, MultiSource sets?

gbMattN · August 12, 2024, 1:03pm

I tested it on a few tests. When running Dormant, it was left dormant for the entire runtime. I never called the function to re-enable checks.

Test Path	No sanitizer(s)	ASan(s)	Dormant ASan(s)	Reduction in overhead
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test	1.51	1.79	1.62	0.393
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test	2.64	3.17	3.09	0.849
MultiSource/Applications/SPASS/SPASS.test	3.89	6.36	5.60	0.692

The overhead is always less than running all the checks, though it fluctuated a bit on the tests. On larger programs like FFmpeg it seems to settle to around about 0.52 the overhead.

gbMattN · August 13, 2024, 4:30pm

I realize I hadn’t used branch weights in my patch. By adding branch weights to make the dormant path more likely, the results are much better.
Once again, the tests run in dormant mode were left dormant for the entire runtime.

Test Path	No sanitizer(s)	ASan(s)	Dormant ASan(s)	Reduction in overhead
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test	1.51	1.79	1.59	0.286
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test	2.64	3.17	2.87	0.434
MultiSource/Applications/SPASS/SPASS.test	3.89	6.36	4.75	0.275

Making the dormant path the likely path made it much faster.

vitalybuka · August 16, 2024, 8:02pm

Hm, those benchmarks looks weird, asan vs no-asan usually 3x or something.

As-is I can’t see myself making a custom build to speedup debug session by 20%.

I cases when I need to debug slow binary I just do -fno-sanitize=address for a part of the code, this gives close to opt performance.

I would prefer we don’t complicates stuff for this niche case.

@melver @fmayer @MaskRay @ramosian-glider WDYT?

MaskRay · August 17, 2024, 7:08am

I’d like to see independent benchmarks.
-fno-sanitize=address and __attribute__((no_sanitize("address"))) are indeed useful options.

I wonder whether “dormant” is accurate. This is more like activation/deactivation of memory load instrumentation.
Allocator and various __asan_* APIs are still working. We would unlikely allow this niche feature to add a condition to the runtime functions (overhead, even if minor).

Topic		Replies	Views
LLVM-based address sanity checker LLVM Dev List Archives	25	322	August 1, 2011
MemorySanitizer, a tool that finds uninitialized reads and more LLVM Dev List Archives	11	133	October 16, 2012
-fbounds-checking vs {SAFECode,ASan} LLVM Dev List Archives	7	150	May 25, 2012
[ubsan] Add -fsanitize-warn-once, only emit runtime error once per check Clang Frontend	10	150	December 30, 2012
AddressSanitizer run-time in tools/clang/runtime/compiler-rt LLVM Dev List Archives	5	158	November 30, 2011

[RFC] Adding a Dormant Mode to AddressSanitizer

Summary

Use Case

Example

Implementation Details

Related topics