In my observation, there are mainly two chances to do functions inlining based on sample profiling data. One is in SampleProfileLoaderPass and the other is in InlinePass. Sample based hot function judgement, as one condition of inlining or not, seems different between these two passes that barrier of hot functions is lower in SampleProfileLoaderPass than in InlinerPass. It might lead that a function has more chance to be inlined in SampleProfileLoaderPass than in InlinePass if only considering sample profile info.
In SampleProfileLoaderPass, if a functions has already inlined in the first build, it will be inlined again in the second build if it is counted as hot. And the standard of hot is that TOTAL sample of the function is bigger than HotCountThreshold as below. TotalSample means total samples in function body.
In InlinePass, inline strategy is much more complicated. If only focus on logic that if a function is hot nor not, there are mainly from two places. One is checking “branch_weights” metadata of a call instruction to judge if it is a hot callsite, the other is checking “function_entry_count” metadata in the callee function’s definition to judge if it is a hot callee. Both of the metadatas estimate to function’s ENTRY count sampling (HeadSample), which is generally much less than TotolSample, and would be also compared to HotCountThreshold to do the hot judgement.
“branch_weights” metadata relates to:
“function_entry_count” relates to:
So it might be not fair that comparing TotalSample to HotCountThreshold to judge if it is hot or should be inlined in SampleProfileLoader, but comparing HeadSample to HotCountThreshold to do the same judge in InlinePass. Because in real application, it is general that TotalSample is much more order of magnitude than HeadSample.
Assuming there are two functions, A()'s total sample is 500 and entry sample is 2; B()'s total sample is 100 and entry sample is 2. If B() has been inlined in first build but A() has not, and HotCountThreshold is 10, B() will be inlined again in second build due to B’s total sample 100 > 10, but A() will still not inlined because A’s entry sample is 2 < 10.
My questions is: Does the logic makes sense that SampleProfileLoaderPass and InlinePass use different order of magnitude number comparing to the same HotCountThreshold as hot function identification? Will it reduce inlining optimization’s effect?
I very look forward that Inline or PGO experts correct my thought if I am wrong or give comments about my view. Thanks!