Summary
I’ve been working on a feature set to disable sanitizer instrumentation for common overflow idioms. For a wide selection of projects, proper overflow sanitization could help catch bugs and solve security vulnerabilities. Unfortunately, in some cases the integer overflow sanitizers are too noisy for their users and are often left disabled. Providing users with a method to disable sanitizer instrumentation of common patterns could mean more projects actually utilize the sanitizers in the first place.
Background
One such project that has opted to not use integer overflow (or truncation) sanitizers is the Linux Kernel. There has been some discussion recently concerning mitigation strategies for unexpected arithmetic overflow. This discussion is still ongoing and a succinct article accurately sums up the discussion. In summary, many Kernel developers do not want to introduce more arithmetic wrappers when most developers understand the code patterns as they are.
Patterns like:
if (base + offset < base) { ... }
or
while (i--) { ... }
or
#define SOME -1UL
… are extremely common in a codebase like the Linux Kernel. It is perhaps too much to ask of kernel developers to use arithmetic wrappers in these cases. For example:
while (wrapping_post_dec(i)) { ... }
… which wraps some builtin would not fly. This would incur too many changes to existing code; the code churn would be too much, at least too much to justify turning on overflow sanitizers.
User Cyberax (from lwn.net) probably shares a similar question to you:
I don’t understand why Linux developers won’t just annotate the expected wraparounds. They are used somewhat frequently, but not so much as to affect the readability. Most source code files in the tree won’t have any annotations at all.
but then realizes:
“I think, the kernel developers just like to be able to read the overflow-dependent code idioms.”
Which beautifuly summarizes the problem. Anyways, the reality is: it’d be nice if we could turn on overflow sanitizers for the linux kernel but there would be far too much noise caused by overflow-dependent code idioms. We need to disable instrumentation of these idioms – and leave the source code intact.
Examples
Currently my feature set tackles three common idioms:
-
if (a + b < a)
or some logically-equivalent re-ordering likeif (a > b + a)
-
while (i--)
(for unsigned) a post-decrement always overflows here -
-1UL
negation of unsigned constants will always overflow
All of which are disabled when -fno-sanitize-overflow-idioms
is ticked on. Check out the tests from the WIP implementation for more info (section below this one).
Implementation Details
Check out what I’ve got here: GitHub Branch
The first type of idiom if (a + b < a)
is implemented as an IR optimization pass. I tried to get my pass to run earlier than most so I can find IR patterns before they are altered too greatly. Once the pattern is identified, I alter the control flow graph by removing the edge to the overflow handler. The hope is that future optimization passes can eliminate the dead code.
The second and third type of idiom while (i--)
and -1UL
are implemented during clang CodeGen within: clang/lib/CodeGen/CGExprScalar.cpp
.
For what it’s worth, @kees and I have done some integration testing with the Linux Kernel and things seem to be working smoothly, idioms are being excluded!
Please let me know if there are better ways to achieve what we’re trying to achieve, I am especially new to the new PassManager and IR optimizations in general. Do note that this is heavily WIP and depending on when you take a look there may be some unused code – I’ve tried to make it as reviewer-friendly as possible.