Propeller can work with pgo use one profile?

zcfh · April 9, 2024, 11:34am

I’m currently learning about pgo+bolt iteration and would love to have a build pipeline like this. Except for the first use, only pgo.bolt.bin will be deployed in the future. Subsequent optimizations will only sample pgo.bolt.bin and use one perf data for pgo+bolt optimization. I know from the previous question that the bolt has infer-stale-profile option to meet such needs.

While learning bolt, I learned that propeller also performs post-link optimization, but only during compilation and linking. I wanted to try it out to check if it would work for the pipeline I was expecting.
The problem I’m currently having is that when I build and execute the autofdo repository using the latest propeller, propeller_optimize_clang.sh is not getting positive optimization effects. I saw the same problem in issue Propeller slows down clang ~20% · Issue #181 · google/autofdo · GitHub, so I was wondering if there was something wrong with the propeller itself.

snehasish · April 9, 2024, 4:17pm

cc: @tmsriram @shenhanc78

tmsriram · April 9, 2024, 4:37pm

Thanks for pointing us to this problem. We will fix the script ASAP and provide clear instructions on optimizing clang with Propeller. I am also working on a ninja config to optimize clang with Propeller which too I will share soon. Thanks!

lifengxiang1025 · April 10, 2024, 3:26am

Propeller now don’t have the match and infer behavior which bolt has. So I think Propeller can’t work well with pgo use one profile(The reason is in [RFC] Add cfg drift detect in propeller).

tmsriram · April 10, 2024, 4:44am

That is correct and I misread the original question. Propeller does not robustly support corrections for source drift at the moment. When used with instrumented FDO, Propeller detects CFG changes when the instrumented profile is applied (hash mismatch) and disables propeller optimizations on those functions.

zcfh · April 10, 2024, 9:54am

Thank you for your answers. This should be an instruction offset caused by the two pgo, and the source code has not changed.
However, intuitively, instruction offsets should also result in mismatch. Because as far as I know, the ordering of basic blocks seems to be done in the create_llvm_prof phase. If the basic blocks generated by pgo for the second time are different from the first time, it seems that the profile is not available. I originally thought that propeller was optimized during the compilation and linking stages and might not be as sensitive to instruction offsets as bolt, and could avoid the problem of two different binaries. I’ll try to understand its inner workings.

zcfh · April 11, 2024, 6:13am

What are the consequences of using a sampling profile? Since the current propeller is not available yet, I cannot do actual verification. In the case where the code does not change, due to the difference in sample profile, the binary built by pgo for the second time may be different from the first time. Will this ignore the old basic block ordering?

zcfh · April 11, 2024, 9:45am

Is this possible in sampling mode:
On the first iteration, there was no problem following the process below.

build ori.bin
perf ori.bin -o ori.perf_data
use ori.perf_data to build pgo1.bin
perf pgo1.bin -o pgo1.perf_data
use pgo1.perf_data to build pgo_propeller1.bin

If I later expect to deploy only pgo_propeller1.bin, the sampled data will also come from it.

perf pgo_propeller1.bin -o pgo_propeller1.perf_data
use pgo_propeller1.perf_data to build pgo2.bin
use pgo_propeller1.perf_data to build pgo_propeller2.bin

Since ori.perf_data and pgo_propeller1.perf_data are different, the basic blocks of pgo1.bin and pgo2.bin may be different? For example, there is an extra hot block, but because it cannot be found in the input cluster.txt, it is placed in the cold session?

zcfh · April 11, 2024, 11:18am

Thanks for the answer, what was mentioned about [RFC] Add cfg drift detect in propeller was very helpful to me. It seems that the propeller will cause wrong optimization due to changes in pgo profile, without march and infer.

zcfh · April 11, 2024, 11:19am

Thanks for the answer, what was mentioned about [RFC] Add cfg drift detect in propeller was very helpful to me. It seems that the propeller will cause wrong optimization due to changes in pgo profile, without march and infer.

Topic		Replies	Views
Can the binary optimized by Autofdo and bolt be iteratively optimized? Using Clang pgo , clang , bolt	0	95	March 25, 2024
[RFC] Propeller: A frame work for Post Link Optimizations LLVM Dev List Archives	55	467	February 11, 2020
How to solve the problem of stale Profile data when Bolt is used with pgo? LLVM Project pgo , llvm , bolt	6	173	April 3, 2024
Profile-Guided Optimization (PGO) related questions and suggestions LLVM Project pgo	24	1467	December 20, 2023
Clang PGO mystery - am I holding this wrong? Clang Frontend pgo , clang	2	170	April 22, 2024

Propeller can work with pgo use one profile?

Related Topics