Recently I read a paper about LIPO (link). I was interested in an idea about making LTO groups based on the runtime profiles.
As far as I understand from the RFC: ThinLTO Impementation Plan discussion, the current ThinLTO approach is the LIPO’s successor. But for me, it’s not clear one thing - does the current ThinLTO implement use somehow PGO profiles to improve module grouping or not? According to the official ThinLTO docs - it seems like it doesn’t. However, these slides mention optional profile data for ThinLTO.
So what is the current status of using runtime profile approaches with ThinLTO in LLVM? Is it implemented or planned to be implemented in the future? If the idea is rejected - could you please describe a bit more why it was rejected?
Kindly ping @teresajohnson as the original ThinLTO author.
LIPO does module grouping at the end of profile collection runtime by building dynamic callgraph. It requires FDO. The grouping decision is made available at the start of the profile-use compilation phase and module merging is performed by the C++ frontend. The approach does not depend on anything related to LTO (i.e., no need for IR serialization …).
ThinLTO does grouping too. It is done at the ThinLink Stage after the frontend compilation phase. The grouping decisions are passed to the backend compilations. ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.
As far as I understand, LIPO is not implemented in the current Clang or LLVM version, right?
ThinLTO does not depend on FDO, so it can use static profile information to make the import decisions.
What is the static profile information? I am wondering because from my understanding in theory grouping decisions based on FDO profiles should be more precise than grouping decisions based on static heuristics.
Thanks for the link! So what was the reason for not implementing FDO-based grouping decisions in ThinLTO? I guess that FDO-based grouping decisions should be more precise than the decisions based on static heuristics.
LLVM also has a pass to propagate function level static profile across function boundaries, but it is not turned on by default.
Could you please tell me the name of this pass and the corresponding compiler switch for enabling it? It will be interesting to play with it locally.
Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.
Got it, thanks! Is there a comparison between “static grouping decisions” vs “FDO-based grouping decisions” from the compiler optimization opportunity/efficiency perspective? I want to understand how valuable is to use FDO profiles for making ThinLTO grouping in practice compared to the static approach.