ThinLTO and profile-based module groups

Hi!

Recently I read a paper about LIPO (link). I was interested in an idea about making LTO groups based on the runtime profiles.

As far as I understand from the RFC: ThinLTO Impementation Plan discussion, the current ThinLTO approach is the LIPO’s successor. But for me, it’s not clear one thing - does the current ThinLTO implement use somehow PGO profiles to improve module grouping or not? According to the official ThinLTO docs - it seems like it doesn’t. However, these slides mention optional profile data for ThinLTO.

So what is the current status of using runtime profile approaches with ThinLTO in LLVM? Is it implemented or planned to be implemented in the future? If the idea is rejected - could you please describe a bit more why it was rejected?

Kindly ping @teresajohnson as the original ThinLTO author.

LIPO does module grouping at the end of profile collection runtime by building dynamic callgraph. It requires FDO. The grouping decision is made available at the start of the profile-use compilation phase and module merging is performed by the C++ frontend. The approach does not depend on anything related to LTO (i.e., no need for IR serialization …).

ThinLTO does grouping too. It is done at the ThinLink Stage after the frontend compilation phase. The grouping decisions are passed to the backend compilations. ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.

As far as I understand, LIPO is not implemented in the current Clang or LLVM version, right?

ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.

What is the static profile information? I am wondering because from my understanding in theory grouping decisions based on FDO profiles should be more precise than grouping decisions based on static heuristics.

It is branch probabilities guessed by the static heuristics. The classic paper is Static branch frequency and program profile analysis

LLVM also has a pass to propagate function level static profile across function boundaries, but it is not turned on by default.

Thanks for the link! So what was the reason for not implementing FDO-based grouping decisions in ThinLTO? I guess that FDO-based grouping decisions should be more precise than the decisions based on static heuristics.

LLVM also has a pass to propagate function level static profile across function boundaries, but it is not turned on by default.

Could you please tell me the name of this pass and the corresponding compiler switch for enabling it? It will be interesting to play with it locally.

@davidxl’s point is that ThinLTO doesn’t need FDO, not that it doesn’t use it. It does. But it can work without it, too, as he explained.

BTW, there are implemented alternative ways of importing functions, see WorkloadImportsManager in lib/Transofrms/IPO/FunctionImport.cpp

1 Like

SyntheticCountsPropagation.cpp defines the pass.

Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.

The non-FDO importing is subject to more tuning. I suspect it may end up with more imports and increased compile time.

1 Like

Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.

We have found that using FDO for ThinLTO results in better performance than ThinLTO alone. Some publicly available data on this for SPEC CPU benchmarks is shown in Fig 2 of our 2017 CGO paper (ThinLTO: Scalable and incremental LTO | IEEE Conference Publication | IEEE Xplore).

2 Likes