Propeller can work with pgo use one profile?

I’m currently learning about pgo+bolt iteration and would love to have a build pipeline like this. Except for the first use, only pgo.bolt.bin will be deployed in the future. Subsequent optimizations will only sample pgo.bolt.bin and use one perf data for pgo+bolt optimization. I know from the previous question that the bolt has infer-stale-profile option to meet such needs.

While learning bolt, I learned that propeller also performs post-link optimization, but only during compilation and linking. I wanted to try it out to check if it would work for the pipeline I was expecting.
The problem I’m currently having is that when I build and execute the autofdo repository using the latest propeller, propeller_optimize_clang.sh is not getting positive optimization effects. I saw the same problem in issue Propeller slows down clang ~20% · Issue #181 · google/autofdo · GitHub, so I was wondering if there was something wrong with the propeller itself.

cc: @tmsriram @shenhanc78

Thanks for pointing us to this problem. We will fix the script ASAP and provide clear instructions on optimizing clang with Propeller. I am also working on a ninja config to optimize clang with Propeller which too I will share soon. Thanks!

Propeller now don’t have the match and infer behavior which bolt has. So I think Propeller can’t work well with pgo use one profile(The reason is in [RFC] Add cfg drift detect in propeller).

That is correct and I misread the original question. Propeller does not robustly support corrections for source drift at the moment. When used with instrumented FDO, Propeller detects CFG changes when the instrumented profile is applied (hash mismatch) and disables propeller optimizations on those functions.

Thank you for your answers. This should be an instruction offset caused by the two pgo, and the source code has not changed.
However, intuitively, instruction offsets should also result in mismatch. Because as far as I know, the ordering of basic blocks seems to be done in the create_llvm_prof phase. If the basic blocks generated by pgo for the second time are different from the first time, it seems that the profile is not available. I originally thought that propeller was optimized during the compilation and linking stages and might not be as sensitive to instruction offsets as bolt, and could avoid the problem of two different binaries. I’ll try to understand its inner workings.

What are the consequences of using a sampling profile? Since the current propeller is not available yet, I cannot do actual verification. In the case where the code does not change, due to the difference in sample profile, the binary built by pgo for the second time may be different from the first time. Will this ignore the old basic block ordering?

Is this possible in sampling mode:
On the first iteration, there was no problem following the process below.

  1. build ori.bin
  2. perf ori.bin -o ori.perf_data
  3. use ori.perf_data to build pgo1.bin
  4. perf pgo1.bin -o pgo1.perf_data
  5. use pgo1.perf_data to build pgo_propeller1.bin

If I later expect to deploy only pgo_propeller1.bin, the sampled data will also come from it.

  1. perf pgo_propeller1.bin -o pgo_propeller1.perf_data
  2. use pgo_propeller1.perf_data to build pgo2.bin
  3. use pgo_propeller1.perf_data to build pgo_propeller2.bin

Since ori.perf_data and pgo_propeller1.perf_data are different, the basic blocks of pgo1.bin and pgo2.bin may be different? For example, there is an extra hot block, but because it cannot be found in the input cluster.txt, it is placed in the cold session?

Thanks for the answer, what was mentioned about [RFC] Add cfg drift detect in propeller was very helpful to me. It seems that the propeller will cause wrong optimization due to changes in pgo profile, without march and infer.

Thanks for the answer, what was mentioned about [RFC] Add cfg drift detect in propeller was very helpful to me. It seems that the propeller will cause wrong optimization due to changes in pgo profile, without march and infer.