Dynamic Spill and Instruction Counts via LLVM


How would people recommend using LLVM (if at all) to get the dynamic spill/reload count and total number of instructions for a program? Speed and efficiency are unimportant. I just want it to be correct.

I’m using a debug build of LLVM 12 “clang version 12.0.0” on an x86_64-unknown-linux-gnu cpu.

What I’ve tried:

I thought I’d use llvm’s pgo to get asm BB counts and then multiply by the number of static spills and reloads in each asm BB.

LLC annotates lines in the assembly file that result in spills and reloads i.e.:
movq %rax, -24(%rsp) # 8-byte Spill
movq -16(%rsp), %rax # 8-byte Reload

I couldn’t find a way to instrument asm BBs w/ pgo using LLC, so I’m doing it at the IR level like this:
clang -emit-llvm -gline-tables-only -O3 file1.c file1.bc
opt -pgo-instr-gen -instrprof file1.bc -o file1.prof.bc
clang -fprofile-instr-generate file1.prof.bc file2.prof.bc etc.
-run it-
llvm-profdata merge -output=pgo.profdata default.profraw
opt -pgo-instr-use -pgo-test-profile-file=pgo.profdata -load mypass file.bc

Where “mypass” iterates through all the IR BBs and prints out the label and block frequency info.

I’ve quickly realized that asm BBs don’t directly correspond to IR BBs b/c the CFG changes during code generation. Although I’ve managed to make a mostly correct mapping between IR BBs and asm BBs using their debug labels, it’s been kind of messy. For example:
There is the odd situation where I’ll find 2 asm BBs with the same debug label w/ pgo counters to different memory addresses.

There are also other complications where the pgo CFG is different from the original CFG, so I’m not convinced this is the correct way to go about it.

What I thought I’d try next
I wanted to write a MachineFunctionPass that gets a BB’s static spill/reload count and then inserts a function call to keep track of its execution frequency. The pass would be run after all CFG changes have been made. There’s a comment in X86TargetMachine.cpp that says:
// The X86 Speculative Execution Pass must run after all control
// flow graph modifying passes."
This is in void X86PassConfig::addPreEmitPass2(), and I was going to add my pass here. Since the function call would clobber registers, I was going to either re-run register allocation or callee save all the registers the function would use.

Any help/feedback would be appreciated.

Thank you.


Armand Behroozi

Hi Armand,

You can print the frequency of the blocks at the machine level.
If that frequency comes from pgo, these numbers should be accurate.

Take a look at -view-machine-block-freq-propagation-dags=fraction for instance.

Then, you should be able to just use the assembly file annotation like you were doing.


Dear Quentin,

Thank you for taking the time to respond. This was really helpful.

One final question. Do you have any suggestions for how to map the counts viewed on the CFG after llc is run w/ pgo data to the original CFG? Would using the BB debug labels (i.e. %while.body.preheader201) be good enough here?

When running LLC w/ pgo data, a function of interest has its BB count go to 149 as opposed to 128 when no pgo data is used. When looking at the CFG manually, labelled basic blocks seem to be in similar locations but I’m unsure if I can rely on this.

Thank you.




Using the labels will be a good proxy, relative location within the file, not so much.You could also use the debug line info, but that’s probably overkill.