extra one cycle of getOperandLatency

Wei-cheng_Wang · December 20, 2013, 6:35am

Hi llvm-dev,

I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.

UseCycle = DefCycle - UseCycle + 1;

When I read the comments in TargetItinerary.td, it said

  OperandCycles are optional "cycle counts". They specify the cycle after
  instruction issue the values which correspond to specific operand indices
  are defined or read.

I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle. InstrItinData should be written
in something like this,

InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>

Therefore, for operand latency of iALUr output to iALUr input is latency
of "1". However, by the implementatoin of getOperandLatency, the latency
of such definition is latency of "2". That's not what I want.

After some digging around, I found the expression, "DefCycle - UseCycle + 1",
was first appearing in r79425 committed by David Goodwin, and seems
OperandCycles
was initially designed for ARM cortex-a8 (see also r79247 and r79436).

Then I checked "Cortex-A8 Technical Reference Manual - Instruction
Cycle Timing".
There are tables for instructions, for example

   Data-processing instructions
   Source1 Source2 Result1
   Rn:E2 Rm:E2 Rd:E2

That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.

And that was implemented in llvm as such

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,

Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?

If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,

Thanks

Wei-cheng Wang

atrick · December 31, 2013, 8:22pm

Hi llvm-dev,

I wonder why there is an extra cycle for getOperandLatency.
It doesn't seem intuitive.

UseCycle = DefCycle - UseCycle + 1;

When I read the comments in TargetItinerary.td, it said

OperandCycles are optional "cycle counts". They specify the cycle after
instruction issue the values which correspond to specific operand indices
are defined or read.

I thought if an instruction reads the operands at the first cycle
and produces the result at the second cycle. InstrItinData should be written
in something like this,

  InstrItinData<IIC_iALUr ,[InstrStage<1, [FU_x]>], [2, 1, 1]>

Therefore, for operand latency of iALUr output to iALUr input is latency
of "1". However, by the implementatoin of getOperandLatency, the latency
of such definition is latency of "2". That's not what I want.

After some digging around, I found the expression, "DefCycle - UseCycle + 1",
was first appearing in r79425 committed by David Goodwin, and seems
OperandCycles
was initially designed for ARM cortex-a8 (see also r79247 and r79436).

Then I checked "Cortex-A8 Technical Reference Manual - Instruction
Cycle Timing".
There are tables for instructions, for example

  Data-processing instructions
  Source1 Source2 Result1
  Rn:E2 Rm:E2 Rd:E2

That means Rn and Rm are read at the begin of E2 stage,
Rd is produced at the end of E2, and there is 1 cycle latency.

And that was implemented in llvm as such

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2, 2]>,

Is that mean, OperandCycles and getOperandLatency were simply designed
in such a way, so it is easier to use the table from cortex-a8 RTM?
So OperandCycles are not actually referred to "cycle",
for input operand it means at the begin of what stage
and for output operand it means at the end of what stage?

If so, is there any other reasons it should be designed this way?
What not remove the +1 cycle and define the instruction as such?

InstrItinData<IIC_iALUr ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [3, 2, 2]>,

I think it’s done this way so that if both def and use cycles are unspecified we get a default of one cycle latency.

At any rate, the itineraries have been around a long time with many out-of-tree targets. I don’t think it’s a good idea to change that old API. New ports should try to use the new machine model instead.

-Andy

Topic		Replies	Views
Instruction Itineraries: question about operand latencies LLVM Dev List Archives	5	174	June 9, 2016
How Hexagon handles OperandCycles? Code Generation hexagon	3	317	November 25, 2022
Way to specify instruction latency in itinerary scheduling model LLVM Dev List Archives	2	182	November 12, 2015
Scheduling unit latencies LLVM Dev List Archives	0	86	June 11, 2015
Get basic-block cycle cost from LLVM LLVM Dev List Archives	2	117	November 10, 2017

extra one cycle of getOperandLatency

Related topics