Compiling for size

We’re desperate to reduce the size of our executable. We’re currently using -Os when compiling for size. We tried -Oz, which produced a significantly smaller executable (~15% smaller), but the perf loss was unacceptable. I was hoping I might find a happy medium between the two and was looking for information on what the difference was between the two, if it might be possible to use -Os along with some additional flags used with -Oz but am not finding a lot of good documentation on -Oz. Any help would be appreciated.

Hopefully someone will correct me if I’m wrong, but I think the only difference between -Os and -Oz is that -Os also implies -O2, and it will conduct that level of performance optimization in areas deemed more important for performance than for size. -Oz applies size optimizations everywhere.

You can run Clang with the -mllvm -debug-pass=Structure or -mllvm -debug-pass=Arguments to see which optimization passes are being run for each -O(whatever) value. You can also take a look in BackendUtil.cpp to see how performance and size levels map to which passes.

Thanks for the info. That helps tremendously.

-Oz pipeline has some known issues that could be contributing to this such as Trivial memset optimization not applied to loops under -Oz (LoopIdiomRecognize) · Issue #50308 · llvm/llvm-project · GitHub

The undocumented and unspoken rule-of-thumb is:

  • -Oz: size at all costs, no perf guarantees
  • -Os: prioritize size, but don’t absolutely steamroll my perf

Could you split out perf-sensitive stuff into its own file and then compile that with -Os maybe? Then the rest with -Oz? Not the greatest solution, but it’s an option.

You could also try using hot-cold splitting. IIRC that will mark functions which aren’t often run (“cold”) as -Oz automatically. The functions which are run often (“hot”) will not be marked as -Oz. However, this will only help if your most perf-sensitive functions are not cold.

We’re looking into that but it’s a difficult change to our pipeline.

The theory makes sense to me but I’m unsure on how to implement this. How would I go about this?

Here is a talk about Hot/Cold split in LLVM - it might have some relevant info still: 2020 LLVM Developers’ Meeting: A. Kumar “Code Size Compiler Optimizations and Techniques” - YouTube