Rust NewPM blocker: Catastrophic inlining

As support for the LegacyPM is supposed to be removed after the LLVM 14 branch, I wanted to bring some wider attention to a blocking issue that the Rust project encountered with the new pass manager.

While NewPM improves compile-time in the average case, there are also rare cases of catastrophic compile-time increases, where code that previously built in seconds, now takes hours to compile. We have encountered multiple instances of this problem, and it has prevented the new pass manager from being enabled in stable rust releases to this date. Perhaps most problematically, the rustc toolchain itself is affected by this issue on the s390x target. (The target is special because it uses higher inlining thresholds.)

This compile-time problem arises due to catastrophic inlining. The inliner takes a small function with ~100 lines of IR, and then expands it to one with ~1000000 lines of IR, which the rest of the optimization pipeline is not prepared to deal with (even if we disregard the very much unwanted increase in code size).

One of the root causes of this issue has been fixed in ⚙ D115497 [Inline] Disable deferred inlining. The other root cause of this problem is also well-understood, and has had a patch available at ⚙ D98481 [Inliner] Do not inline mutual-recursive functions to avoid exponential size growth. for close to a year now.

Unfortunately, there was no consensus to merge that patch, due to concerns about possible performance impact. The SPEC2017 measurements on that patch suggest less inlining without performance impact — but inlining is such a sensitive area that a regression on some workload is pretty much guaranteed for any inlining heuristic change.

There was some discussion about a possible alternative, which is to limit caller size during inlining, but it doesn’t look like it went anywhere (and I’m skeptical that this is a fix rather than a workaround).

As this is a NewPM migration blocker for rust, and the upcoming LegacyPM removal makes this a time-critical problem, I would like some advice on how to move forward here.

3 Likes

We’ve hit this at Apple too. I think we disabled the new PM first time around, and this time I cherry-picked the proposed D98481 change onto the branch. Which isn’t necessarily an endorsement, I still don’t think I fully understand the issues in the area.

Is ⚙ D98481 [Inliner] Do not inline mutual-recursive functions to avoid exponential size growth. robustly addressing the issues Rust is facing? If so, would enabling its behavior behind a flag (which is on by default in the rust case - I guess the frontend can flip it) help?

Why would limiting caller size during inlining not be a fix? Asking because naively it seems it would, if inlining decisions aren’t reversible: any policy would try to guesstimate if accumulating more IR in a caller may actually lead to a cascading IR reduction (Tetris comes to mind), but it could get unlucky.

Have we explored, though, having some checkpointing mechanism whereby we can bail out of if a sequence of decisions doesn’t pan out? I can imagine lots of challenges, but it could be an option to discuss maybe?

Is :gear: D98481 [Inliner] Do not inline mutual-recursive functions to avoid exponential size growth. robustly addressing the issues Rust is facing? If so, would enabling its behavior behind a flag (which is on by default in the rust case - I guess the frontend can flip it) help?

The patch addresses all the issues I’m currently aware of at least.

Landing the patch behind a flag would address the immediate problem on the rust side, though it’s not a great long term solution. I don’t think anything about this problem is really specific to Rust. As Tim indicated above, Apple has hit this problem as well, and the patch was submitted by Facebook, so I can only assume that they’re also affected. It’s not a common problem, but seems to be something you inevitably hit if you just compile enough code.

Why would limiting caller size during inlining not be a fix? Asking because naively it seems it would, if inlining decisions aren’t reversible: any policy would try to guesstimate if accumulating more IR in a caller may actually lead to a cascading IR reduction (Tetris comes to mind), but it could get unlucky.

I guess this depends on how this is implemented. What I was imagining here is a naive limit of the kind “if caller size would grow above N, don’t inline”. The problem with that is that N probably has to be chosen quite high to avoid regressions. So if say N=10000, then we would still get an unwanted code size increase by that N. The inliner would still go crazy, we would just limit by how much. It gets worse when you consider that the same function may be called from many places, so we could see the same N=10000 increase for each call site.

Have we explored, though, having some checkpointing mechanism whereby we can bail out of if a sequence of decisions doesn’t pan out? I can imagine lots of challenges, but it could be an option to discuss maybe?

A major constraint for the inliner is that it runs as part of a CGSCC pipeline, which means that it should, by design at least, only keep state relating to the SCC it currently operates on (in practice, we end up violating this in a minor way to avoid a different catastrophic inlining instance). I don’t think it will be possible to do that kind of major change without giving up on the CGSCC approach entirely.