+1 on @WenleiHe said about benchmarking and baselines.
In particular, the profile guided cmov conversion pass is disabled by default. With this turned on @apostolakis noted a 1% improvement on clang bootstrap on top of instrumentation PGO+ThinLTO. Reduced, but still measurable improvements were noted when using sample based profiles. Does the addition of unpredictable branch data improve beyond this baseline?
Regarding the events used for sampling, wouldn’t br_inst_retired.conditional
and br_misp_retired.conditional
be more accurate? Also LBR has metadata bits which includes mispredict information, using this means we wouldn’t need an additional profile collection step. Any reason why this was not considered?
Regarding usability and managing profiles, we should consider extending the sample profile extbinary format to hold additional profile data instead of new profile files.
Overall, I’m excited about the prospect of new profile types and eager to see how we can improve beyond the state of the art.