extra one cycle of getOperandLatency

Hi llvm-dev,

I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.

  UseCycle = DefCycle - UseCycle + 1;

When I read the comments in TargetItinerary.td, it said

  OperandCycles are optional "cycle counts". They specify the cycle after
  instruction issue the values which correspond to specific operand indices
  are defined or read.

I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle. InstrItinData should be written
in something like this,

   InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>

Therefore, for operand latency of iALUr output to iALUr input is latency
of "1". However, by the implementatoin of getOperandLatency, the latency
of such definition is latency of "2". That's not what I want.

After some digging around, I found the expression, "DefCycle - UseCycle + 1",
was first appearing in r79425 committed by David Goodwin, and seems
OperandCycles
was initially designed for ARM cortex-a8 (see also r79247 and r79436).

Then I checked "Cortex-A8 Technical Reference Manual - Instruction
Cycle Timing".
There are tables for instructions, for example

   Data-processing instructions
   Source1 Source2 Result1
   Rn:E2 Rm:E2 Rd:E2

That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.

And that was implemented in llvm as such

  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,

Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?

If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?

  InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,

Thanks

Wei-cheng Wang

Hi llvm-dev,

I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.

UseCycle = DefCycle - UseCycle + 1;

When I read the comments in TargetItinerary.td, it said

OperandCycles are optional "cycle counts". They specify the cycle after
instruction issue the values which correspond to specific operand indices
are defined or read.

I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle. InstrItinData should be written
in something like this,

  InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>

Therefore, for operand latency of iALUr output to iALUr input is latency
of "1". However, by the implementatoin of getOperandLatency, the latency
of such definition is latency of "2". That's not what I want.

After some digging around, I found the expression, "DefCycle - UseCycle + 1",
was first appearing in r79425 committed by David Goodwin, and seems
OperandCycles
was initially designed for ARM cortex-a8 (see also r79247 and r79436).

Then I checked "Cortex-A8 Technical Reference Manual - Instruction
Cycle Timing".
There are tables for instructions, for example

  Data-processing instructions
  Source1 Source2 Result1
  Rn:E2 Rm:E2 Rd:E2

That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.

And that was implemented in llvm as such

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,

Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?

If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,

I think it’s done this way so that if both def and use cycles are unspecified we get a default of one cycle latency.

At any rate, the itineraries have been around a long time with many out-of-tree targets. I don’t think it’s a good idea to change that old API. New ports should try to use the new machine model instead.

-Andy