How and where optimizing of undefined behavior happens

Dear community,

I was wondering how exactly clang/llvm handles undefined behavior (UB).
Because of this experiment: Compiler Explorer,
I tend to think that clang is light-weight in this subject and does
not detect UB.
And all the optimizations related to UB are done in llvm via some
optimization pass (passes).

Is it so? If so than what is the name of that pass? And do different
optimization levels detect different UB cases, i.e. will situation
change with O1, O2, O3?

By default, Clang has little or nothing to do with UB, its job is to do a straightforward translation to IR. UB-related optimizations are spread throughout the passes. You can get a bit of an idea of what's going on by running this command from an LLVM source tree:

   find ./lib -name '*.cpp' | xargs grep -i undef

But also, a lot of sanitizer code lives in Clang.


That’s not quite true, because the mapping to IR helps define what is UB in the source language. For example, if you don’t enable the undefined behaviour sanitiser or pass -ftrapv / -fwrapv, clang will set the nsw (no signed wrap) flag on IR arithmetic operations on types that are signed in C, which then permits the optimisers to treat wrapping behaviour as undefined.


Sure, and also the frontend eliminates probably 50-100 preprocessor/parser/lexer-related UBs. But I was trying answer what I understood the OP to be actually asking, which is where the real action happens, and that's to a large extent spread across the middle-end optimization passes.