[RFC] Add a module inliner

In my GSoC 2021, my goal is to evaluate the value of different callsite visiting orderings, which is inspired by paper[1][2]. I’ve already enabled doing such an exploration within the constraints of the current SCC inliner (https://reviews.llvm.org/D104028). Exploring more advanced callsite orderings is not possible with the current SCC inliner. The current SCC inliner runs on each SCC in a bottom-up traversal, which means that the inline order is limited to a bottom-up order.

To address this limitation, I would like to add a module inliner, which processes all call sites in a given module at a time instead of a given SCC. This gives us flexibility on the order in which we process call sites.

The module inliner would be disabled by default, to minimize code churn to the existing codebase and make it easier to remove it. Also, to avoid unnecessary abstractions, which would complicate the existing codebase, some code is copied from SCC inliner. Lastly, to foster collaboration, I would propose landing it in trunk rather than a branch.

Best wishes,

Liqiang Tao

---- [1] Aleksandar Prokopec, Gilles Duboscq, David Leopoldseder, and Thomas Würthinger. 2019. An optimization-driven incremental inline substitution algorithm for just-in-time compilers. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). [2] Dhruva R. Chakrabarti and Shin-Ming Liu. 2006. Inline Analysis: Beyond Selection Heuristics. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '06).

+a couple of folks who might have some thoughts/ideas here

This is a topic that has been discussed a few times in the past. Currently CGSCC inliner doesn’t use a size/growth cap, so the value of call site prioritization is questionable as we’re going to inline everything deemed beneficial anyways. But prioritization would be helpful for top-down inliner that needs a size/growth cap.

Under Sample PGO, we now have a top-down inliner in sample loader already to handle most of the hot inlining (https://reviews.llvm.org/D70655, https://reviews.llvm.org/D82919). Sample loader is a module pass, so the inliner there is module inliner too. And with the new CSSPGO (https://lists.llvm.org/pipermail/llvm-dev/2020-August/144101.html), we’ve changed to use a call site prioritized top-down inliner for sample loader as the inlining there is no longer limited to inline replay (https://reviews.llvm.org/D94001). The sample loader inliner is probably closer to what you’re planning to do, but such top-down call site prioritization would be most effective when context-sensitive profile is available so inliner can do proper specialization along different inline context.

+@Hongtao Yu @Lei Wang




I think the idea is worth experimenting as long as the experiment does not affect the production code very much.

We will most likely need a size cap. I think we have yet to see whether we can completely replace the profile-driven CGSCC inliner or use the priority-based inliner for hot paths with low-priority call sites left for the CGSCC inliner to process.

I am curious how many cases we can catch that are typical limitations of the CGSCC inliner. By that, I mean cases where, given A->B->C, inlining C into B prevents B from being inlined into A, but inlining the original B into A is more profitable. Hot code with lukewarm slow paths are pretty common like std::vector::push_back, which occasionally allocates memory.

Kazu Hirata