Thanks for doing this! I am very interested in using it for the AMDGPU
target. Have you given any thought to targets with
MicroOpBufferSize=1? I understand that these are also "in order". I
found that I could get some tests running with these changes: https://reviews.llvm.org/differential/diff/329308/
But I am really shooting in the dark here. I don't have a good
understanding of the difference between MicroOpBufferSize=0 and 1, and
I am not even sure which setting is really best for AMDGPU.
* The llvm-mca static performance analysis tool now support in-order CPUs such
as the Arm Cortex-A55. [d791695](rGd791695cb517).
Thanks for doing this! I am very interested in using it for the AMDGPU
target.
So far the feature was only tested for ARM in-order CPUs, so it will be
great if you can try it for the AMDGPU target!
Have you given any thought to targets with MicroOpBufferSize=1?
I understand that these are also "in order". I found that I could get
some tests running with these changes: ⚙ Diff View
But I am really shooting in the dark here. I don't have a good
understanding of the difference between MicroOpBufferSize=0 and 1, and
I am not even sure which setting is really best for AMDGPU.
Frankly, I don't know what is the difference between MicroOpBufferSize=0
and 1. We should probably treat them the same for MCA, so your changes
look good.
It only affects what instructions the scheduler puts in the ready queue. In VLIW-mode, the scheduler only considers instructions that can be scheduled in the current group. In InOrder mode, the scheduler can weigh the potential latency stall against other heuristics. I don’t think it’s relevant for MCA.
-Andy
// “0” means operations that are not ready in this cycle are not considered
// for scheduling (they go in the pending queue). Latency is paramount. This
// may be more efficient if many instructions are pending in a schedule.
//
// “1” means all instructions are considered for scheduling regardless of
// whether they are ready in this cycle. Latency still causes issue stalls,
// but we balance those stalls against other heuristics.
//
// “> 1” means the processor is out-of-order. This is a machine independent
// estimate of highly machine specific characteristics such as the register
// renaming pool and reorder buffer.
Thanks. I found there is already an MCSchedModel::isOutOfOrder which
makes it slightly less cryptic. I've put a patch up at https://reviews.llvm.org/D98356 to try to support MicroOpBufferSize=1
in llvm-mca as simply as possible.