[Inline][PGO] Barrier of hot functions is lower in SampleProfileLoaderPass than in InlinePass?

tianleliu · March 3, 2025, 2:07pm

In my observation, there are mainly two chances to do functions inlining based on sample profiling data. One is in SampleProfileLoaderPass and the other is in InlinePass. Sample based hot function judgement, as one condition of inlining or not, seems different between these two passes that barrier of hot functions is lower in SampleProfileLoaderPass than in InlinerPass. It might lead that a function has more chance to be inlined in SampleProfileLoaderPass than in InlinePass if only considering sample profile info.
In SampleProfileLoaderPass, if a functions has already inlined in the first build, it will be inlined again in the second build if it is counted as hot. And the standard of hot is that TOTAL sample of the function is bigger than HotCountThreshold as below. TotalSample means total samples in function body.

github.com/llvm/llvm-project

llvm/lib/Transforms/IPO/SampleProfile.cpp

4e330faac


      
          if (callsiteIsHot(FS, PSI, ProfAccForSymsInList))
            Hot = true;

github.com/llvm/llvm-project

llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp

4e330faac


      
          uint64_t CallsiteTotalSamples = CallsiteFS->getTotalSamples();
          if (ProfAccForSymsInList)
            return !PSI->isColdCount(CallsiteTotalSamples);
          else
            return PSI->isHotCount(CallsiteTotalSamples);

In InlinePass, inline strategy is much more complicated. If only focus on logic that if a function is hot nor not, there are mainly from two places. One is checking “branch_weights” metadata of a call instruction to judge if it is a hot callsite, the other is checking “function_entry_count” metadata in the callee function’s definition to judge if it is a hot callee. Both of the metadatas estimate to function’s ENTRY count sampling (HeadSample), which is generally much less than TotolSample, and would be also compared to HotCountThreshold to do the hot judgement.
“branch_weights” metadata relates to:

github.com/llvm/llvm-project

llvm/lib/Analysis/InlineCost.cpp

4e330faac


      
          auto HotCallSiteThreshold = getHotCallSiteThreshold(Call, CallerBFI);
          if (!Caller->hasOptSize() && HotCallSiteThreshold) {
            LLVM_DEBUG(dbgs() << "Hot callsite.\n");

github.com/llvm/llvm-project

llvm/lib/Analysis/ProfileSummaryInfo.cpp

4e330faac


      
          bool ProfileSummaryInfo::isHotCallSite(const CallBase &CB,
                                                 BlockFrequencyInfo *BFI) const {
            auto C = getProfileCount(CB, BFI);
            return C && isHotCount(*C);
          }

“function_entry_count” relates to:

github.com/llvm/llvm-project

llvm/lib/Analysis/InlineCost.cpp

4e330faac


      
          if (PSI->isFunctionEntryHot(&Callee)) {
            LLVM_DEBUG(dbgs() << "Hot callee.\n");

github.com/llvm/llvm-project

llvm/include/llvm/Analysis/ProfileSummaryInfo.h

4e330faac


      
          template <typename FuncT> bool isFunctionEntryHot(const FuncT *F) const {
            if (!F || !hasProfileSummary())
              return false;
            std::optional<Function::ProfileCount> FunctionCount = getEntryCount(F);
            // FIXME: The heuristic used below for determining hotness is based on
            // preliminary SPEC tuning for inliner. This will eventually be a
            // convenience method that calls isHotCount.
            return FunctionCount && isHotCount(FunctionCount->getCount());

So it might be not fair that comparing TotalSample to HotCountThreshold to judge if it is hot or should be inlined in SampleProfileLoader, but comparing HeadSample to HotCountThreshold to do the same judge in InlinePass. Because in real application, it is general that TotalSample is much more order of magnitude than HeadSample.
Assuming there are two functions, A()'s total sample is 500 and entry sample is 2; B()'s total sample is 100 and entry sample is 2. If B() has been inlined in first build but A() has not, and HotCountThreshold is 10, B() will be inlined again in second build due to B’s total sample 100 > 10, but A() will still not inlined because A’s entry sample is 2 < 10.

My questions is: Does the logic makes sense that SampleProfileLoaderPass and InlinePass use different order of magnitude number comparing to the same HotCountThreshold as hot function identification? Will it reduce inlining optimization’s effect?

I very look forward that Inline or PGO experts correct my thought if I am wrong or give comments about my view. Thanks!

Topic		Replies	Views
[Sample PGO] Which optimizations currently use sample PGO in llvm? Beginners pgo	2	471	June 28, 2023
Path forward on profile guided inlining? LLVM Dev List Archives	1	122	September 1, 2015
Path forward on profile guided inlining? LLVM Dev List Archives	18	135	December 11, 2015
Profile-based inlining status LLVM Dev List Archives	3	117	March 8, 2016
PGO information at LTO/thinLTO link step LLVM Dev List Archives	10	196	October 3, 2017

[Inline][PGO] Barrier of hot functions is lower in SampleProfileLoaderPass than in InlinePass?

Related topics