LLVM-17 optimization levels comparison

Hi LLVM! I recently ran some benchmarks with LLVM 17 on AArch64.
Here are some of the results, in case anyone is interested.

This shows variations in code size and execution time for different optimization levels, LTO (flto) and PGO (profile-instr) optimizations.

  • SPEC2017, C/C++ benchs only. Ref is -O2, lower is faster, to the left is smaller.
  • Min exec time of 3 runs (on FX700 AArch64 machine) on train dataset. PGO trained on the same dataset.

IMO, it is interesting to see once again the performance and size tradeoffs available with the different optimization levels.
At -O2 & -O3 levels, LTO and PGO give good results in both code size (10/20%) and execution time (5/10%) dimensions.
At -Os & -Oz levels, PGO only improves performance, and slightly increases size. Probably not expected, haven’t looked at it yet.

Any comments welcome ; )


I have previously checked the performance effect of PGO in SPECrate 2017 / LLVM-14 and noticed the following.

  • PGO has a good effect on the workloads that have a lot of branches and small functions: perlbench, gcc and xalancbmk. I think that the compiler can perform better inlining.
  • Compared to the frontend PGO (-fprofile-instr-generate), IR based PGO(-fpriflie-generate) gives well-balanced performance improvement. When I tried, frontend-PGO gave a negative effect on mcf’s performance but IR-PGO improved it.

It is good to know for me that PGO and LTO have a positive impact even on code size.
Thank you.