Can base binaries be optimized using optimized binary perf data?

zcfh · October 26, 2023, 7:11am

The scenario I envision is this, use bolt to optimize the binary, then only sample the optimized binary, and use the sampled data to optimize the new binary.
I tried it and got a binary optimized error.

aaupov · October 29, 2023, 4:19am

Hello,

Yes, this is a supported scenario. Please add -enable-bat option to your BOLT flags. You can then collect samples and pass the binary to perf2bolt to produce the profile which can be used to optimize the original binary.

Note however that the profile can’t be directly used to optimize another (new) binary as it would likely not match the offsets. To optimize the new binary you can convert the resulting profile into YAML format using the original binary (perf2bolt exe.orig -data bat.fdata -w yaml -o /dev/null) and then add -infer-stale-profile flag when you optimize a new binary using yaml profile.

BAT (BOLT Address Translation) is non-allocatable section which is used to map samples back to the original offsets.

zcfh · November 3, 2023, 6:17am

Thank you for your answer. Can I think that with some conversion steps I can use the perf.data obtained by perf old_exe.bolt and optimize new_exe.orig. But the Bolt readme does not introduce this method of use. Isn’t this common? For a service, if perf sampling of exe.orig is required every time, it means additional deployment steps are required, which is very inconvenient.

aaupov · November 11, 2023, 5:06am

Can I think that with some conversion steps I can use the perf.data obtained by perf old_exe.bolt and optimize new_exe.orig .
Yes, using the steps I described above.

But the Bolt readme does not introduce this method of use. Isn’t this common? For a service, if perf sampling of exe.orig is required every time, it means additional deployment steps are required, which is very inconvenient.

Agree that profiling the same binary as the one to be optimized complicates the deployment. But that’s the recommended mode of BOLT usage to reach peak performance. This is common with other PGO techniques.

Sampling BOLTed binary is and has been uncommon because there was no stale profile matching until very recently. Without stale matching a nontrivial part of samples would have been discarded, potentially reducing or eliminating performance wins.

Using the profile from BOLTed binary with stale matching is new and therefore not very well supported (involves the conversion step that we can/may eliminate) and not yet covered in README. Let us know if it works well for you.

More conventional method with simplified deployment is to periodically deploy/profile your top-of-trunk state of service, save the BOLT profile for subsequent use plus corresponding AutoFDO/PGO profile if it was used, and use that pair of AutoFDO/PGO and BOLT profiles for your release. This way, it still may have some imperfect match (staleness), but it’s operationally easier as the release doesn’t have to be profiled. You can combine this with -infer-stale-profile for improved performance.

Topic		Replies	Views
Can the binary optimized by Autofdo and bolt be iteratively optimized? Using Clang pgo , clang , bolt	0	67	March 25, 2024
How to solve the problem of stale Profile data when Bolt is used with pgo? LLVM Project pgo , llvm , bolt	6	139	April 3, 2024
Error with perf2bolt in LLVM BOLT LLVM Dev List Archives	3	126	April 10, 2020
If the -enable-bat option is enabled, the effect of secondary optimization is weakened BOLT bolt	1	125	January 12, 2024
BOLT: Optimizing relocatable files BOLT	4	432	March 23, 2023

Can base binaries be optimized using optimized binary perf data?

Related Topics