ThinLTO and profile-based module groups

zamazan4ik · February 21, 2024, 8:03am

Hi!

Recently I read a paper about LIPO (link). I was interested in an idea about making LTO groups based on the runtime profiles.

As far as I understand from the RFC: ThinLTO Impementation Plan discussion, the current ThinLTO approach is the LIPO’s successor. But for me, it’s not clear one thing - does the current ThinLTO implement use somehow PGO profiles to improve module grouping or not? According to the official ThinLTO docs - it seems like it doesn’t. However, these slides mention optional profile data for ThinLTO.

So what is the current status of using runtime profile approaches with ThinLTO in LLVM? Is it implemented or planned to be implemented in the future? If the idea is rejected - could you please describe a bit more why it was rejected?

Kindly ping @teresajohnson as the original ThinLTO author.

davidxl · February 21, 2024, 5:02pm

LIPO does module grouping at the end of profile collection runtime by building dynamic callgraph. It requires FDO. The grouping decision is made available at the start of the profile-use compilation phase and module merging is performed by the C++ frontend. The approach does not depend on anything related to LTO (i.e., no need for IR serialization …).

ThinLTO does grouping too. It is done at the ThinLink Stage after the frontend compilation phase. The grouping decisions are passed to the backend compilations. ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.

zamazan4ik · February 21, 2024, 5:14pm

As far as I understand, LIPO is not implemented in the current Clang or LLVM version, right?

ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.

What is the static profile information? I am wondering because from my understanding in theory grouping decisions based on FDO profiles should be more precise than grouping decisions based on static heuristics.

davidxl · February 21, 2024, 5:28pm

It is branch probabilities guessed by the static heuristics. The classic paper is Static branch frequency and program profile analysis

LLVM also has a pass to propagate function level static profile across function boundaries, but it is not turned on by default.

zamazan4ik · February 21, 2024, 5:33pm

Thanks for the link! So what was the reason for not implementing FDO-based grouping decisions in ThinLTO? I guess that FDO-based grouping decisions should be more precise than the decisions based on static heuristics.

LLVM also has a pass to propagate function level static profile across function boundaries, but it is not turned on by default.

Could you please tell me the name of this pass and the corresponding compiler switch for enabling it? It will be interesting to play with it locally.

mtrofin · February 21, 2024, 5:58pm

@davidxl’s point is that ThinLTO doesn’t need FDO, not that it doesn’t use it. It does. But it can work without it, too, as he explained.

BTW, there are implemented alternative ways of importing functions, see WorkloadImportsManager in lib/Transofrms/IPO/FunctionImport.cpp

davidxl · February 21, 2024, 6:14pm

SyntheticCountsPropagation.cpp defines the pass.

zamazan4ik · February 21, 2024, 9:08pm

Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.

davidxl · February 21, 2024, 10:11pm

The non-FDO importing is subject to more tuning. I suspect it may end up with more imports and increased compile time.

teresajohnson · February 26, 2024, 3:25pm

Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.

We have found that using FDO for ThinLTO results in better performance than ThinLTO alone. Some publicly available data on this for SPEC CPU benchmarks is shown in Fig 2 of our 2017 CGO paper (ThinLTO: Scalable and incremental LTO | IEEE Conference Publication | IEEE Xplore).

Topic		Replies	Views
LTO query LLVM Dev List Archives	6	185	May 11, 2018
LTO Module splitting and metadata LLVM Dev List Archives	1	98	January 15, 2016
Current PGO status LLVM Dev List Archives	8	149	February 26, 2018
GSOC ThinLTO Proposal LLVM Dev List Archives	3	65	March 31, 2017
Question about thinLTO LLVM Dev List Archives	9	121	July 14, 2017

ThinLTO and profile-based module groups

Related topics