Can base binaries be optimized using optimized binary perf data?

The scenario I envision is this, use bolt to optimize the binary, then only sample the optimized binary, and use the sampled data to optimize the new binary.
I tried it and got a binary optimized error.

Hello,

Yes, this is a supported scenario. Please add -enable-bat option to your BOLT flags. You can then collect samples and pass the binary to perf2bolt to produce the profile which can be used to optimize the original binary.

Note however that the profile can’t be directly used to optimize another (new) binary as it would likely not match the offsets. To optimize the new binary you can convert the resulting profile into YAML format using the original binary (perf2bolt exe.orig -data bat.fdata -w yaml -o /dev/null) and then add -infer-stale-profile flag when you optimize a new binary using yaml profile.

BAT (BOLT Address Translation) is non-allocatable section which is used to map samples back to the original offsets.

1 Like

Thank you for your answer. Can I think that with some conversion steps I can use the perf.data obtained by perf old_exe.bolt and optimize new_exe.orig. But the Bolt readme does not introduce this method of use. Isn’t this common? For a service, if perf sampling of exe.orig is required every time, it means additional deployment steps are required, which is very inconvenient.

Can I think that with some conversion steps I can use the perf.data obtained by perf old_exe.bolt and optimize new_exe.orig .
Yes, using the steps I described above.

But the Bolt readme does not introduce this method of use. Isn’t this common? For a service, if perf sampling of exe.orig is required every time, it means additional deployment steps are required, which is very inconvenient.

Agree that profiling the same binary as the one to be optimized complicates the deployment. But that’s the recommended mode of BOLT usage to reach peak performance. This is common with other PGO techniques.

Sampling BOLTed binary is and has been uncommon because there was no stale profile matching until very recently. Without stale matching a nontrivial part of samples would have been discarded, potentially reducing or eliminating performance wins.

Using the profile from BOLTed binary with stale matching is new and therefore not very well supported (involves the conversion step that we can/may eliminate) and not yet covered in README. Let us know if it works well for you.

More conventional method with simplified deployment is to periodically deploy/profile your top-of-trunk state of service, save the BOLT profile for subsequent use plus corresponding AutoFDO/PGO profile if it was used, and use that pair of AutoFDO/PGO and BOLT profiles for your release. This way, it still may have some imperfect match (staleness), but it’s operationally easier as the release doesn’t have to be profiled. You can combine this with -infer-stale-profile for improved performance.