llvm-mca for in-order CPUs (was Re: LLVM Weekly - #375, March 8th 2021)

Thanks for doing this! I am very interested in using it for the AMDGPU
target. Have you given any thought to targets with
MicroOpBufferSize=1? I understand that these are also "in order". I
found that I could get some tests running with these changes:
https://reviews.llvm.org/differential/diff/329308/

But I am really shooting in the dark here. I don't have a good
understanding of the difference between MicroOpBufferSize=0 and 1, and
I am not even sure which setting is really best for AMDGPU.

Thanks,
Jay.

Hi Jay,

Jay Foad writes:

* The llvm-mca static performance analysis tool now support in-order CPUs such
  as the Arm Cortex-A55. [d791695](rGd791695cb517).

Thanks for doing this! I am very interested in using it for the AMDGPU
target.

So far the feature was only tested for ARM in-order CPUs, so it will be
great if you can try it for the AMDGPU target!

Have you given any thought to targets with MicroOpBufferSize=1?
I understand that these are also "in order". I found that I could get
some tests running with these changes:
⚙ Diff View

But I am really shooting in the dark here. I don't have a good
understanding of the difference between MicroOpBufferSize=0 and 1, and
I am not even sure which setting is really best for AMDGPU.

Frankly, I don't know what is the difference between MicroOpBufferSize=0
and 1. We should probably treat them the same for MCA, so your changes
look good.

We should really have some alias for MicroOpBufferSize=0/1. It’s too cryptic.

InOrder => MicroOpBufferSize=1
VLIW => MicroOpBufferSize=0

It only affects what instructions the scheduler puts in the ready queue. In VLIW-mode, the scheduler only considers instructions that can be scheduled in the current group. In InOrder mode, the scheduler can weigh the potential latency stall against other heuristics. I don’t think it’s relevant for MCA.

-Andy

// “0” means operations that are not ready in this cycle are not considered
// for scheduling (they go in the pending queue). Latency is paramount. This
// may be more efficient if many instructions are pending in a schedule.
//
// “1” means all instructions are considered for scheduling regardless of
// whether they are ready in this cycle. Latency still causes issue stalls,
// but we balance those stalls against other heuristics.
//
// “> 1” means the processor is out-of-order. This is a machine independent
// estimate of highly machine specific characteristics such as the register
// renaming pool and reorder buffer.

Thanks. I found there is already an MCSchedModel::isOutOfOrder which
makes it slightly less cryptic. I've put a patch up at
https://reviews.llvm.org/D98356 to try to support MicroOpBufferSize=1
in llvm-mca as simply as possible.

Jay.