Clang-cl optimization option

On Windows, I encountered a benchmark where, compiled with clang-cl, -Xclang -O3 produces code about 2.5x as fast as just /O2. I found that /O2 only enables -O2 (by viewing invokation using -v). Does anyone know why clang-cl only enables -O2 instead of -O3 with /O2 ? Is it by design?

clang version 18.1.7
Target: x86_64-pc-windows-msvc

See benchmark code:

(Note: Performance results are specific to clang-cl with the target triple specified above and therefore cannot be reproduced on quick-bench. The 2.5x speedup upon -Xclang -O3 applys to bmPushBack.)

1 Like

CC @hansw2000

I think this may have been an oversight (but maybe I’m wrong). MSVC has no /O3, but documents /O2 as optimizing for maximum speed, and /O1 as optimizing for minimum code size.

Based on that, I would expect /O1 to map to -Oz and /O2 (and /Ot) to map to -O3.

The /O flags are complicated. We fiddled with them a lot originally, but they’ve been pretty stable since [clang-cl] Handle -O correctly · llvm/llvm-project@015ce0f · GitHub

As mentioned in the docs, /O2 corresponds to /Og (no effect) /Oi (use intrinsics) /Ot (optimize for speed - we map that to -O2) /Oy (omit frame pointer on x86) /Ob2 (-finline-functions) /GF (string pooling) /Gy (like -ffunction-sections). The main logic is in TranslateOptArg.

So the question boils down to whether we should map /Ot to -O3 instead of -O2. We probably should. I’ll draft a patch.

1 Like