Getting profile guided section layout to work with full LTO

I’m playing around with the profile guided section layout feature implemented in LLVM and LLD (in ⚙ D48105 [llvm][Instrumentation] Add Call Graph Profile pass, ⚙ D44965 [llvm][Instrumentation/MC] Add Call Graph Profile pass and object file emission., and ⚙ D36351 [lld][ELF] Add profile guided section layout). I’m running my experiments on Clang trunk (currently 374f5f0df432a2ebeccffa1ec972920d195ddcbe).

I created a dummy file order.c for testing:

#include <stdlib.h>
__attribute__((noinline)) int f0() { return rand(); }
__attribute__((noinline)) int f1() { return f0(); }
__attribute__((noinline)) int f2() { return f1(); }
__attribute__((noinline)) int f3() { return f2(); }
__attribute__((noinline)) int f4() { return f3(); }
__attribute__((noinline)) int f5() { return f4(); }
__attribute__((noinline)) int f6() { return f5(); }
__attribute__((noinline)) int f7() { return f6(); }
__attribute__((noinline)) int main() {
  for (int i = 0; i < 1000000; ++i)
    f7();
  return 0;
}

I can successfully use the profile guided section layout without LTO:

$ clang -fuse-ld=lld -O2 -ffunction-sections -fdata-sections -fprofile-generate=. order.c -o order_instr
$ ./order_instr
$ llvm-profdata merge *.profraw -o prof.prof
$ clang -fuse-ld=lld -O2 -ffunction-sections -fdata-sections -fprofile-use=prof.prof order.c -o order

I know the ordering worked because objdump -d order shows the layout main, f7, f6, etc, and if I generate an object file at the -fprofile-use step, it contains the .llvm.call-graph-profile section.

I can also successfully use the profile guided section layout with ThinLTO, by adding -flto=thin to both the -fprofile-generate and -fprofile-use steps.

On the other hand, if I try to use full LTO (by adding -flto to the -fprofile-generate and -fprofile-use steps), the ordering stops working. If I pass -Wl,--save-temps and examine the results, the CG profile in the IR (for all steps) looks like

!34 = !{i32 5, !"CG Profile", !35}
!35 = !{!36, !37, !38, !39, !40, !41, !42, !43, !44}
!36 = distinct !{null, null, i64 1000000}
!37 = distinct !{null, null, i64 1000000}
!38 = distinct !{null, null, i64 1000000}
!39 = distinct !{null, null, i64 1000000}
!40 = distinct !{null, null, i64 1000000}
!41 = distinct !{null, null, i64 1000000}
!42 = distinct !{null, null, i64 1000000}
!43 = distinct !{null, null, i64 1000000}
!44 = distinct !{null, null, i64 1000225}

The caller and callee edges in the metadata are null, and I imagine this is why there’s no .llvm.call-graph-profile section in the object file generated by LTO, and consequently no profile guided section layout.

Am I setting something up incorrectly or missing some flags?

I’m facing the same problem.

I’d looked into this briefly at the time, and as far as I could tell, the pass was never getting run during LTO, and we were just inheriting the dummy metadata from the compilations, whereas ThinLTO ran the pass during ThinLTO and didn’t during compilation. I assume the fix would me to make LTO behave like ThinLTO, but I didn’t have the time to look into it further.

Yeah, I’ve tried to make it work by replicating what ThinLTO does. No success.

I wonder if this is a side effect of how LLVM treats Call Graphs in general:

This file provides interfaces used to build and manipulate a call graph, which is a very useful tool for interprocedural optimization.

Every function in a module is represented as a node in the call graph. The callgraph node keeps track of which functions are called by the function corresponding to the node.

A call graph may contain nodes where the function that they correspond to is null. These ‘external’ nodes are used to represent control flow that is not represented (or analyzable) in the module. In particular, this analysis builds one external node such that:

  1. All functions in the module without internal linkage will have edges from this external node, indicating that they could be called by functions outside of the module.
  2. All functions whose address is used for something more than a direct call, for example being stored into a memory location will also have an edge from this external node. Since they may be called by an unknown caller later, they must be tracked as such.

There is a second external node added for calls that leave this module. Functions have a call edge to the external node iff:

  1. The function is external, reflecting the fact that they could call anything without internal linkage or that has its address taken.
  2. The function contains an indirect function call.

As an extension in the future, there may be multiple nodes with a null function. These will be used when we can prove (through pointer analysis) that an indirect call site can call only a specific set of functions.

Because of these properties, the CallGraph captures a conservative superset of all of the caller-callee relationships, which is useful for transformations.

https://llvm.org/doxygen/CallGraph_8h.html