I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.
UseCycle = DefCycle - UseCycle + 1;
When I read the comments in TargetItinerary.td, it said
OperandCycles are optional "cycle counts". They specify the cycle after
instruction issue the values which correspond to specific operand indices
are defined or read.
I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle. InstrItinData should be written
in something like this,
InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>
Therefore, for operand latency of iALUr output to iALUr input is latency
of "1". However, by the implementatoin of getOperandLatency, the latency
of such definition is latency of "2". That's not what I want.
After some digging around, I found the expression, "DefCycle - UseCycle + 1",
was first appearing in r79425 committed by David Goodwin, and seems
was initially designed for ARM cortex-a8 (see also r79247 and r79436).
Then I checked "Cortex-A8 Technical Reference Manual - Instruction
There are tables for instructions, for example
Source1 Source2 Result1
Rn:E2 Rm:E2 Rd:E2
That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.
And that was implemented in llvm as such
InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,
Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?
If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?
InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,