On Windows, I encountered a benchmark where, compiled with clang-cl, -Xclang -O3
produces code about 2.5x as fast as just /O2
. I found that /O2
only enables -O2
(by viewing invokation using -v). Does anyone know why clang-cl only enables -O2
instead of -O3
with /O2
? Is it by design?
clang version 18.1.7
Target: x86_64-pc-windows-msvc
See benchmark code:
(Note: Performance results are specific to clang-cl with the target triple specified above and therefore cannot be reproduced on quick-bench. The 2.5x speedup upon -Xclang -O3
applys to bmPushBack.)
1 Like
CC @hansw2000
I think this may have been an oversight (but maybe I’m wrong). MSVC has no /O3
, but documents /O2
as optimizing for maximum speed, and /O1
as optimizing for minimum code size.
Based on that, I would expect /O1
to map to -Oz
and /O2
(and /Ot
) to map to -O3
.
The /O
flags are complicated. We fiddled with them a lot originally, but they’ve been pretty stable since [clang-cl] Handle -O correctly · llvm/llvm-project@015ce0f · GitHub
As mentioned in the docs, /O2
corresponds to /Og
(no effect) /Oi
(use intrinsics) /Ot
(optimize for speed - we map that to -O2
) /Oy
(omit frame pointer on x86) /Ob2
(-finline-functions) /GF
(string pooling) /Gy
(like -ffunction-sections). The main logic is in TranslateOptArg
.
So the question boils down to whether we should map /Ot
to -O3 instead of -O2. We probably should. I’ll draft a patch.
1 Like