Here’s a faithful X-Ray trace (in chrome trace format) of
./bin/llvm-exegesis --mode=uops --benchmark-phase=assemble-measured-code --num-repetitions=100000 --opcode-name=VADDPSrr
exegesis-xray-ctf-trace.json.xz.txt (813.7 KB)
(rename to exegesis-xray-ctf-trace.json.xz
, un-xz
it, and e.g. upload to https://ui.perfetto.dev/)
As we can see, we spend all the time in llvm::exegesis::assembleToStream()
(1.2 s in the example),
and there are 3 main places where it spends time:
-
DuplicateSnippetRepetitor::repeat()
(0.2 s) -
MachineVerifier
(0.2 s) -
llvm::AsmPrinter::emitFunctionBody()
(0.8 s)