Recently we came across a customer application that suffered from a 100% compile time degradation when built with llvm13 (and LTO) vs. llvm12. It turns out that the slowdown is caused by loop unrolling, and more specifically Dominator tree updating performed by the loop unrolling pass.
The loop unrolling pass is invoked many times, but the instances that take a large amount of time occur with extremely large functions. I’m not sure about the relationship between the function size and the time Dom tree updating takes, but it appears to be non-linear.
I was wondering if anyone had any insights on how to best throttle loop unrolling in large functions or if that is even the best way to control compile time. Using #pragma clang loop unroll (disable) would be an obvious suggestion for the user, but it would be interesting to consider an option that prevents unrolling when a certain threshold wrt function size is crossed.
A warning at compile time would also help, so the user can more easily diagnose which function they should pay attention to.
At the very least this looks like a serious regression between versions. Can you check clang 14 to see if the issue remains?
I’m pinging some people that seem to have worked on the dominator tree analysis around 2020/2021 to see if anything rings a bell. By your description, this sounds like a change in complexity of some analysis, or perhaps just a side effect of a more aggressive unrolling making the already existing complex analysis go much slower.
A smaller reproducer would also help a lot investigate the problem. If you can create an issue on Github with a small example on what trigger a slow down, that’d help a lot.
Based on time-frame and the issue being domtree updates in particular, it’s possible that ⚙ D103561 [LoopUnroll] Reorder code to max dom tree update more obvious [nfc] is related. We switched more parts of unrolling to use DTU. It’s possible that this hits the cutoff and ends up recomputing the DT for the whole function.
Though I’d still find it odd that DT updates by the unroller have a significant impact on end-to-end compile-time, unless we are unrolling many small loops inside a huge function and each one recomputes the DT, or something like that.
(Based on your description, I’m assuming this is not a case where unrolling itself produces a lot of code.)