per-operand scheduling model help

I’m trying to use the per-operand scheduling model (i.e. setting SchedRW = […] in my instruction definitions) to model an in-order CPU and finding it a bit awkward. I wonder if there’s anything I’m missing, or a better way to do this?

This is for various subtargets of the AMDGPU backend. They are in-order and issue one instruction per cycle, but instructions can execute on different pipelines and may take multiple cycles to execute. For example ADD might execute in 1 cycle on the normal ALU pipe, FMA might take 2 cycles on a dedicated FMA pipe, and SQRT might take 4 cycles and use both pipes. Typically all instructions have an additional few cycles of latency before the result can be used by another instruction.

For instructions with exactly one def operand it’s fairly easy. I can use a single WriteRes that defines ProcResources and ResourceCycles and Latency.

For instructions with two defs, like an ADD that also produces a carry-out in a condition register, it gets awkward. I can use two WriteRes’s where the first one defines the resources needed to execute the instruction and the second one doesn’t, it just defines the latency of the second result. But by default every WriteRes has NumMicroOps = 1, so now the scheduler thinks my instruction has two micro-ops which will take two cycles to issue, which is not what I wanted at all. I can set NumMicroOps = 0 in the second WriteRes, but it feels like I’m abusing the per-operand model to define properties (lie ProcResources and ResourceCycles) that are really per-opcode, not per-operand. Doing it per-operand also means that I end up defining two orthogonal sets of SchedWrite classes, one to use for the first def operand and one to use for the second and subsequent def operands.

Even worse, how do I model the resources needed to execute instructions like stores that have no defs?

Thanks for any help,