Recently we came across a customer application that suffered from a 100% compile time degradation when built with llvm13 (and LTO) vs. llvm12. It turns out that the slowdown is caused by loop unrolling, and more specifically Dominator tree updating performed by the loop unrolling pass.
The loop unrolling pass is invoked many times, but the instances that take a large amount of time occur with extremely large functions. I’m not sure about the relationship between the function size and the time Dom tree updating takes, but it appears to be non-linear.
I was wondering if anyone had any insights on how to best throttle loop unrolling in large functions or if that is even the best way to control compile time. Using #pragma clang loop unroll (disable) would be an obvious suggestion for the user, but it would be interesting to consider an option that prevents unrolling when a certain threshold wrt function size is crossed.
A warning at compile time would also help, so the user can more easily diagnose which function they should pay attention to.
At the very least this looks like a serious regression between versions. Can you check clang 14 to see if the issue remains?
I’m pinging some people that seem to have worked on the dominator tree analysis around 2020/2021 to see if anything rings a bell. By your description, this sounds like a change in complexity of some analysis, or perhaps just a side effect of a more aggressive unrolling making the already existing complex analysis go much slower.
Though I’d still find it odd that DT updates by the unroller have a significant impact on end-to-end compile-time, unless we are unrolling many small loops inside a huge function and each one recomputes the DT, or something like that.
(Based on your description, I’m assuming this is not a case where unrolling itself produces a lot of code.)