Is it possible to instrument assembly code with LLVM-MCA-BEGIN and LLVM-MCA-END inside each function at the start and end to make each function a code region in llvm-mca?
Not automatically but it’s a great idea! [llvm-mca] Instrument assembly code to make multiple code regions · Issue #59731 · llvm/llvm-project · GitHub
As a workaround, I’ve inserted the comments in AsmPrinter::EmitFunctionBody() at the start and end (This can be enhanced to control with command line option.) which makes each function a separate code region
void AsmPrinter::EmitFunctionBody() {
OutStreamer->GetCommentOS()
<< "LLVM-MCA-BEGIN " << GlobalValue::dropLLVMManglingEscape(F.getName())
<< ‘\n’;
// …
OutStreamer->GetCommentOS()
<< "LLVM-MCA-END " << GlobalValue::dropLLVMManglingEscape(F.getName())
<< ‘\n’;
}
On a bigger picture, I’m trying to get more accurate cycles for a function with calls in it. The current version of llvm-mca doesn’t correctly model call instructions and assumes a latency of 100 cycles. The idea is to get the cycle count for each function and add it up using a call graph. Of course, assuming all the functions in the call graph are available to llvm-mca. Does this make sense or am I trying which is not possible?
On a bigger picture, I’m trying to get more accurate cycles for a function with calls in it. The current version of llvm-mca doesn’t correctly model call instructions and assumes a latency of 100 cycles. The idea is to get the cycle count for each function and add it up using a call graph. Of course, assuming all the functions in the call graph are available to llvm-mca. Does this make sense or am I trying which is not possible?
Hi,
I left a comment on github issue 59731.
Essentially, I don’t think that you can just mark functions that way.
MCA doesn’t understand control flow, and therefore there is no way to speculate on which branches would be taken or not. Instructions of a code sequence are not expected to modify control flow (except maybe for the terminator).
You should be able to safely mark individual basic blocks. However, keep in mind that for memory intensive basic blocks, the analysis would be often inaccurate. That is because scheduling models in LLVM often use optimistic latency values for memory load operations. Some of these limitations are also mentioned in the official docs. These are separate issues though.
-Andrea
Thanks for the clarity, do you know of any tools that would achieve my goal?