I’m currently trying to use pgo and bolt to optimize our services and they have an additive effect, if I follow the steps: code → org_bin → perf → pgo1 → perf → pgo_bolt1 . Based on the original binary sampling data, perform pgo optimization, then sample pgo1, and then perform bolt optimization.
However, this brings some extra work to the deployment of services, so I tried to sample pgo_bolt1 and use a same perf data to perform pgo and bolt. The purpose of this is that I only need to deploy pgo_bolt to perform iterative optimization.
Proceed as follows
perf pgo_bolt1 → pgo2; the effects of pgo2 and pgo1 are close
perf pgo_bolt1 and perform bolt based on pgo2. At this time, bolt has no optimization effect. And bolt will have alarm logs, 40% have invalid (possibly stale) profile.
I searched for the reason (it may not be correct, please correct me). Since the perf data of pgo1 and pgo2 are different, the binary output by them is inconsistent. This causes the instruction offset output by perf2bolt to not match in pgo2.
Is it possible to solve this problem?
However, this brings some extra work to the deployment of services, so I tried to sample pgo_bolt1 and use a same perf data to perform pgo and bolt. The purpose of this is that I only need to deploy pgo_bolt to perform iterative optimization.
There’s BAT mode which allows sampling BOLTed binary. BAT is enabled by -enable-bat flag. You can then sample BOLTed binary to collect BOLT profile. You would need to pass BOLTed binary to perf2bolt and it automatically detects BAT.
You can sample BOLTed binary to collect PGO profile if you update debug information used to match profile back to the source. Use -update-debug-sections.
This causes the instruction offset output by perf2bolt to not match in pgo2.
BOLT has stale profile matching feature that can partially mitigate binary differences. Stale matching requires the use of YAML profile (produced with perf2bolt binary -p perf.data -o fdata -w yaml), and is enabled by -infer-stale-profile flag passed to BOLT at optimization time.
llvm-bolt binary.orig -data fdata -w yaml -o /dev/null
and then use yaml profile with new binary, adding -infer-stale-profile.
Is there a bolt2source/source2bolt tool to reduce the impact of this instruction address change?
BOLT doesn’t map the profile using source information. The profile matching accuracy requirements are much higher for BOLT to be effective which makes the use of source information impractical. BOLT uses either address/offset-based profile (fdata), or binary basic block-based profile (yaml).
Thank you for your answer. Here I will add some information.
I am currently using BAT mode.
-infer-stale-profile This option feels like the function I want, and I will try it.
Regarding bolt2source/source2bolt tool, it is a simple idea of mine that has not been fully investigated. My idea is: Since the binaries of pgo2 and pgo1 are different, I hope to use a tool to convert fdata so that its address can match pgo2. For example
Thanks, I tested that infer-stale-profile can eliminate the perf stale caused by two different pgo binaries. I can still gain performance benefits. I want to briefly understand the principle of infer-stale-profile. Is there any documentation on this?
Regarding the [9] Profile inference revisited cited in the paper, I have a question.
The assumption is that vertex weights are a result of profiling an actual binary, while branch probabilities are likely coming from a predictive model, which is arguably less trustworthy.
The paper does not mention how the predictive model predicts. Isn’t this probability what really needs to be speculated on? If you already know the actual counts of some points and the probabilities of edges, don’t you just need to allocate the required traffic to the matching points? Wouldn’t it be better to allocate the remaining traffic according to probability?