On Windows, I encountered a benchmark where, compiled with clang-cl, -Xclang -O3 produces code about 2.5x as fast as just /O2. I found that /O2 only enables -O2 (by viewing invokation using -v). Does anyone know why clang-cl only enables -O2 instead of -O3 with /O2 ? Is it by design?
clang version 18.1.7
Target: x86_64-pc-windows-msvc
See benchmark code:
(Note: Performance results are specific to clang-cl with the target triple specified above and therefore cannot be reproduced on quick-bench. The 2.5x speedup upon -Xclang -O3 applys to bmPushBack.)
1 Like
CC @hansw2000
I think this may have been an oversight (but maybe I’m wrong). MSVC has no /O3, but documents /O2 as optimizing for maximum speed, and /O1 as optimizing for minimum code size.
Based on that, I would expect /O1 to map to -Oz and /O2 (and /Ot) to map to -O3.
The /O flags are complicated. We fiddled with them a lot originally, but they’ve been pretty stable since [clang-cl] Handle -O correctly · llvm/llvm-project@015ce0f · GitHub
As mentioned in the docs, /O2 corresponds to /Og (no effect) /Oi (use intrinsics) /Ot (optimize for speed - we map that to -O2) /Oy (omit frame pointer on x86) /Ob2 (-finline-functions) /GF (string pooling) /Gy (like -ffunction-sections). The main logic is in TranslateOptArg.
So the question boils down to whether we should map /Ot to -O3 instead of -O2. We probably should. I’ll draft a patch.
1 Like