Instrument assembly code to make multiple code regions

Is it possible to instrument assembly code with LLVM-MCA-BEGIN and LLVM-MCA-END inside each function at the start and end to make each function a code region in llvm-mca?

1 Like

Not automatically but it’s a great idea! [llvm-mca] Instrument assembly code to make multiple code regions · Issue #59731 · llvm/llvm-project · GitHub

As a workaround, I’ve inserted the comments in AsmPrinter::EmitFunctionBody() at the start and end (This can be enhanced to control with command line option.) which makes each function a separate code region

void AsmPrinter::EmitFunctionBody() {
OutStreamer->GetCommentOS()
<< "LLVM-MCA-BEGIN " << GlobalValue::dropLLVMManglingEscape(F.getName())
<< ‘\n’;
// …
OutStreamer->GetCommentOS()
<< "LLVM-MCA-END " << GlobalValue::dropLLVMManglingEscape(F.getName())
<< ‘\n’;
}

On a bigger picture, I’m trying to get more accurate cycles for a function with calls in it. The current version of llvm-mca doesn’t correctly model call instructions and assumes a latency of 100 cycles. The idea is to get the cycle count for each function and add it up using a call graph. Of course, assuming all the functions in the call graph are available to llvm-mca. Does this make sense or am I trying which is not possible?

On a bigger picture, I’m trying to get more accurate cycles for a function with calls in it. The current version of llvm-mca doesn’t correctly model call instructions and assumes a latency of 100 cycles. The idea is to get the cycle count for each function and add it up using a call graph. Of course, assuming all the functions in the call graph are available to llvm-mca. Does this make sense or am I trying which is not possible?

Hi,

I left a comment on github issue 59731.

Essentially, I don’t think that you can just mark functions that way.
MCA doesn’t understand control flow, and therefore there is no way to speculate on which branches would be taken or not. Instructions of a code sequence are not expected to modify control flow (except maybe for the terminator).

You should be able to safely mark individual basic blocks. However, keep in mind that for memory intensive basic blocks, the analysis would be often inaccurate. That is because scheduling models in LLVM often use optimistic latency values for memory load operations. Some of these limitations are also mentioned in the official docs. These are separate issues though.

-Andrea

Thanks for the clarity, do you know of any tools that would achieve my goal?