We’re desperate to reduce the size of our executable. We’re currently using -Os when compiling for size. We tried -Oz, which produced a significantly smaller executable (~15% smaller), but the perf loss was unacceptable. I was hoping I might find a happy medium between the two and was looking for information on what the difference was between the two, if it might be possible to use -Os along with some additional flags used with -Oz but am not finding a lot of good documentation on -Oz. Any help would be appreciated.
Hopefully someone will correct me if I’m wrong, but I think the only difference between -Os and -Oz is that -Os also implies -O2, and it will conduct that level of performance optimization in areas deemed more important for performance than for size. -Oz applies size optimizations everywhere.
You can run Clang with the -mllvm -debug-pass=Structure
or -mllvm -debug-pass=Arguments
to see which optimization passes are being run for each -O(whatever) value. You can also take a look in BackendUtil.cpp
to see how performance and size levels map to which passes.
Thanks for the info. That helps tremendously.
-Oz
pipeline has some known issues that could be contributing to this such as Trivial memset optimization not applied to loops under -Oz (LoopIdiomRecognize) · Issue #50308 · llvm/llvm-project · GitHub
The undocumented and unspoken rule-of-thumb is:
- -Oz: size at all costs, no perf guarantees
- -Os: prioritize size, but don’t absolutely steamroll my perf
Could you split out perf-sensitive stuff into its own file and then compile that with -Os maybe? Then the rest with -Oz? Not the greatest solution, but it’s an option.
You could also try using hot-cold splitting. IIRC that will mark functions which aren’t often run (“cold”) as -Oz automatically. The functions which are run often (“hot”) will not be marked as -Oz. However, this will only help if your most perf-sensitive functions are not cold.
We’re looking into that but it’s a difficult change to our pipeline.
The theory makes sense to me but I’m unsure on how to implement this. How would I go about this?
Here is a talk about Hot/Cold split in LLVM - it might have some relevant info still: 2020 LLVM Developers’ Meeting: A. Kumar “Code Size Compiler Optimizations and Techniques” - YouTube