How would people recommend using LLVM (if at all) to get the dynamic spill/reload count and total number of instructions for a program? Speed and efficiency are unimportant. I just want it to be correct.
I’m using a debug build of LLVM 12 “clang version 12.0.0” on an x86_64-unknown-linux-gnu cpu.
What I’ve tried:
I thought I’d use llvm’s pgo to get asm BB counts and then multiply by the number of static spills and reloads in each asm BB.
LLC annotates lines in the assembly file that result in spills and reloads i.e.:
movq %rax, -24(%rsp) # 8-byte Spill
movq -16(%rsp), %rax # 8-byte Reload
I couldn’t find a way to instrument asm BBs w/ pgo using LLC, so I’m doing it at the IR level like this:
clang -emit-llvm -gline-tables-only -O3 file1.c file1.bc
opt -pgo-instr-gen -instrprof file1.bc -o file1.prof.bc
clang -fprofile-instr-generate file1.prof.bc file2.prof.bc etc.
llvm-profdata merge -output=pgo.profdata default.profraw
opt -pgo-instr-use -pgo-test-profile-file=pgo.profdata -load mypass file.bc
Where “mypass” iterates through all the IR BBs and prints out the label and block frequency info.
I’ve quickly realized that asm BBs don’t directly correspond to IR BBs b/c the CFG changes during code generation. Although I’ve managed to make a mostly correct mapping between IR BBs and asm BBs using their debug labels, it’s been kind of messy. For example:
There is the odd situation where I’ll find 2 asm BBs with the same debug label w/ pgo counters to different memory addresses.
There are also other complications where the pgo CFG is different from the original CFG, so I’m not convinced this is the correct way to go about it.
What I thought I’d try next
I wanted to write a MachineFunctionPass that gets a BB’s static spill/reload count and then inserts a function call to keep track of its execution frequency. The pass would be run after all CFG changes have been made. There’s a comment in X86TargetMachine.cpp that says:
// The X86 Speculative Execution Pass must run after all control
// flow graph modifying passes."
This is in void X86PassConfig::addPreEmitPass2(), and I was going to add my pass here. Since the function call would clobber registers, I was going to either re-run register allocation or callee save all the registers the function would use.
Any help/feedback would be appreciated.