[RFC] Safe Optimizations for Sanitizers

Hi all,

This RFC is about how we can make LLVM’s optimizations preserve more potential bugs that can be caught with the sanitizers that are part of LLVM (especially ASan, MSan and UBSan).

TL;DR: The optimizer (correctly) transforms code without considering the potential sanitizer checks that get removed by the transformation. Given that sanitizers are often combined with the optimizer for good reasons, should we have an automatic pipeline/mode that selectively disables these few ‘dangerous’ transformations.

Background

Clang/LLVM comes with several different sanitizers that automatically inject checks into compiled programs. These checks catch undefined behavior or other kinds of bugs during execution and then report them in some way (usually by printing an error and then aborting the execution). Sanitizers are successfully used in various ways from hardening software in production to finding bugs by enabling them when doing manual testing or fuzzing.

Sanitizers are also frequently used in combination with LLVM’s own optimization pipeline. For example, the speed at which fuzzers can find bugs is limited by the execution time of the program, so usually fuzzers test a sanitized and heavily optimized version of the target program. But manual testing is also usually done with sanitizers and optimizations enabled because it simply reduces the amount of computing resources spent on running tests. One example in the wild is Chrome’s documentation which recommends turning on the different sanitizers as well as compiling in release mode (which enables optimizations).

Mixing Optimizations and Sanitizers

The general assumption that users have when turning on both sanitizers and optimizations is that they end up with a fast program that triggers the same sanitizer reports as the slow unoptimized program [1]. In reality however the optimized program might end up producing fewer or no sanitizer reports[2].

There are two main reasons for why an unoptimized program with a sanitizer failure will actually pass after optimizations:

  1. A sanitizer check that is probabilistic now just passes by accident. For example, an load from an out-of-bounds pointer with ASan might point to a different but unpoisoned memory region in the optimized version, while in the unoptimized version it accesses poisoned memory.
  2. Code containing a sanitizer check and ends up ‘being’ fixed because an optimization transforms code in a way that is only equivalent in a program that is free of undefined behavior. For example, SimplifyLIbCalls currently replaces the pattern strlen(x) == 0 with *x == ‘\0’ which is a correct transformation in a well-formed program. But if x has no null terminator this does an out-of-bounds read, so the code is not equivalent when considering potential sanitizer failures (in this case ASan).

This RFC is only about missing sanitizer reports that fall into the second category.

[1] Or at least passes/fails in the same way as the unoptimized sanitized binary, so getting a different sanitizer failure should be fine.

[2] It might also produce more, but that’s a different issue out of scope.

Current situation

So I looked at UBSan, MSan and ASan and tried to make various failing programs pass by simply optimizing them. The end result was about 15 cases where we end up with an optimizer pass ‘fixing’ a failing program even with the respective sanitizer enabled. I’m about to file the respective bugs on GitHub within the next week(s) and then list them below as I file them. I’ll link to an existing dupe if I can find one:

Most bugs can be grouped into the following categories:

  • Generic optimizations that just remove ‘dead’ code if the only possible side effect is UB (e.g., dead stores/reads, etc.).
  • InstCombine/SimplifyLibCalls simplifying libc/builtin calls in a way that doesn’t preserve UB (removing two sequential free calls on a pointer).
  • Several passes transforming code related to undef values (this mostly affects MSan).

Also most of the bugs only affect MSan and ASan (as they run late in the optimizer pipeline). UBSan meanwhile is relatively resistant to optimizations as its checks are inserted by the frontend.

There are also a few cases where we don’t actually sanitize certain UB but otherwise have an observable crash/hang being legalized. Those also seem relevant for this RFC so I’ll also list them above (with a note) when I file them.

Why does it matter?

So the question is whether these bugs actually affect anyone or it’s just weird edge cases. I ended up scanning the Ubuntu/Debian package repositories to see what kind of sanitizer reports I can trigger after fixing the found optimizer bugs. This was mostly done by static analysis to make the approach scale to all the software shipped by Ubuntu.

The results are that I ended up with after about 3 weeks of scanning software are:

  • about 40 bugs found&fixed in actively maintained software that were missed by the optimized sanitized program (8 of them were fixed by someone else between release and when I scanned the package).
  • ~100 bugs in unmaintained software in Ubuntu (I didn’t end up filing them as there was no maintainer to contact).

There is also a large backlog of several thousand reports that I still need to verify and report so the numbers above are subject to change.

Potential solutions

From what I can see we have a several ways to fix the issues:

  • Tell users to not use optimizations if they care about these issues. Given the severe performance impact of disabling all optimizations this is probably not a viable solution for most users (especially when they are doing fuzzing).
  • A partial solution for some of the MSan/ASan issues is to move the sanitizer passes to the beginning of the pipeline. That effectively disables a lot of optimizations and also adds a rather large overhead because of all the additional inserted checks.
  • We have a custom optimizer pipeline/mode that tries to preserve sanitizer checks where possible.

An optimizer mode that tries to preserve UB/sanitizer checks

The last option is what I would propose in this RFC. We have a mode for the optimizer that gets automatically activated alongside sanitizers that tries to preserve UB where possible.

There is already a precedent for doing custom passes in combination with sanitizers (MSan is for example adds another EarlyCSEPass instance to the pipeline, the ‘fuzzing’ sanitizer mode adds a custom ‘optforfuzzing’ function attribute used by SimplifyCFG).

I prototyped this idea by selectively disabling the faulty passes (or parts of passes) in a downstream LLVM fork and the performance overhead from doing so seems reasonably low (about 5%, but the numbers might change depending on how we decide to fix certain issues). My downstream fork has some very intrusive changes to LLVM, so I am not sure if it’s feasible to upstream it in its current form.

However, I think we could fix most of the upstream issues via the following changes:

  • We automatically disable some passes that extensively ignore/exploit UB that any (activated) sanitizer is supposed to catch. It’s not clear yet what is the full list, but currently I have DSE, ACDE, (IP)SCCP disabled in my LLVM fork .
  • We disable some optimization features in Clang/LLVM (e.g., -fno-builtin would fix several issues related to transforming calls to builtins.)

That still misses a few corner cases (e.g., unused loads to potentially out-of-bounds memory are trivially dead right now and removed everywhere, InstCombine sometimes transforms undef to other values), but from what I can see this would get us most of the way to have more reliable sanitizers with optimizations.

2 Likes

Is this a problem with -O1 ?

I think you should clarify what it means to “preserve” UB. Should we modify optimizations ad-hoc, or do we want to formally define new semantics in “UB-preserving” mode? Which forms of UB do we care about preserving? Do we need to preserve the precise error message the sanitizer would produce, or is producing a different message okay?

I think the general idea of turning off transforms for the sake of sanitizers or instrumentation makes sense.

Blindly doing optimisations based on UB is dangerous. There should be some option some somewhere.

If you want to propose something about the optimizations clang performs by default, please start a new topic. (This has been discussed extensively elsewhere, and I don’t want to derail this discussion.)

1 Like

I think this one is the right solution, or at least a large part of it. The best way to make sure sanitizer checks are not optimized away is to make it illegal to optimize them away under usual semantics, which is what we get if we insert them early.

Where “early” should probably be after the first SROA run, to still get a lot of the useful optimization. I think it’s reasonable to adjust SROA to not aggressively optimize out-of-bounds accesses in sanitized builds, but this at least limits the scope of where we need to take care.

1 Like

Agreed. We don’t want to patch every single optimization to disable something because of the sanitizers. That’s a pain and it’s brittle and will rotten over time.
If you don’t want to do full instrumentation upfront, you can use e.g. to pass pointers over identity functions so LLVM can’t optimize anything on those pointers anymore.

I don’t think all is lost though. Over time, you can improve optimizers to recover some of that lost perf.

TL;DR: we don’t think it’s a good idea for you to fork LLVM and have a version of LLVM IR with different semantics.

I agree with nikic. If we don’t want optimizations to exploit UB, the correct thing to do is to insert the checks (which remove the UB) earlier in the optimization pipeline.

I’ve made this suggestion several times in the past, and resistance seems to come down to potential performance impact.

As nikic points out, earlier does not have to mean before all optimization. I think it would make a lot of sense to evaluate potential earlier points to understand the tradeoff between performance and result quality.

I suspect - having worked on a Java frontend which emits checks of similar purpose if slightly different form - that we can also get a lot of mileage out of looking closely at the checks being emitted, and how idiomatic they are for the optimizer. For instance, bounds checks can be expressed in multiple ways, and some are much harder to optimize than others.

I am generally quite hesitant to continue adding “don’t optimize this if X sanitizer” checks in the optimizer. As was previously mentioned, we have no formal model for what this means, and thus can never be quite sure if we got it right, or even what right means.

1 Like
  • A partial solution for some of the MSan/ASan issues is to move the sanitizer passes to the beginning of the pipeline. That effectively disables a lot of optimizations and also adds a rather large overhead because of all the additional inserted checks.

Do we have any statistics of the large overhead after we moving the sanitizer passes to the beginning of the pipeline ?
Is there any benchmark we can test about this? Thanks!

Bother ASan[1] and MSan[2] papers used the SPEC2006 for performance measurement AddressSanitizerPerformanceNumbers. I believe it would be good to do the same measurement on it.

[1] Serebryany, K., Bruening, D., Potapenko, A., and Vyukov, D. (2012). AddressSanitizer: A Fast Address Sanity Checker. In 2012 USENIX Annual Technical Conference (USENIX ATC 12).
[2] Stepanov, E., and Serebryany, K. (2015). MemorySanitizer: fast detector of uninitialized memory use in C++. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

2 Likes