I’ll be giving a short presentation on the LLVM performance workshop soon and I want to touch on the topic of future performance improvements. I decided to ask the community about what can drive performance improvements in a classic C++ LLVM compiler CPU backend in the future? If I summarize all the thoughts and opinions, I think it would be an interesting discussion.
There is already a body of research on the topic, including 1 which talks about superoptimizers, but maybe anybody has some interesting new ideas.
In particular, I’m interested to hear thoughts on the following things:
- How big is the performance headroom in existing LLVM optimization passes?
- I think PGO can play a bigger role in the future. I see the benefits of more optimizations being guided by profiling data. For example, there is potential for intelligent injection of memory prefetching hints based on HW telemetry data on modern Intel CPUs. This HW telemetry data allows finding memory accesses that miss in caches and estimate the prefetch window (in cycles). Using this data compiler can determine the place for a prefetch hint. Obviously, there are lots of limitations, but it’s just a thought. BTW, the same can be done for PGO-driven branch-to-cmov conversion (fighting branch mispredictions).
- ML opportunities in compiler tooling. For example, code similarity analysis 2 opens a wide range of opportunities, e.g. build a recommendation system that will suggest a better performing code sequence.
Please also share any thoughts you have that are not on this list.
If that topic was discussed in the past, sorry, and please send links to those discussions.