Instruction Itineraries: question about operand latencies

In our architecture loads from certain memory locations take a long time to complete (on the order of 150 clock cycles). Since we don’t have a way to tell at compile time if the address being loaded from lies in slow or fast memory, I’ve gone ahead and made all of the load numbers high, such as:

InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>]>,

However, I see that there is another field which I haven’t specified where operand latencies are specified. Here’s an example from

ARMScheduleA8.td:

InstrItinData<IIC_iALUi ,[InstrStage<1, [A8_Pipe0, A8_Pipe1]>], [2, 2]>,

Now I’m wondering if Instead of what I had above, I should instead have specified:

InstrItinData< II_LOAD1, [InstrStage<150, [AGU]>],[150,1,1]>,

?

but is that first ‘150’ parameter there redundant? Since it’s specified in the operand latency list ([150,1,1] - the first element of that array being the latency for the output)?

To clarify, for values of ‘A’ and ‘B’ below:

InstrItinData< II_LOAD1, [InstrStage<A, [AGU]>], [B,1,1]>,

…what is the difference in the meaning for ‘A’ and ‘B’? Are they essentially the same value since only one functional unit is specified? ([AGU])

Phil

Hi Phil

There are some comments in “include/llvm/Target/TargetItinerary.td” where class InstrItinData is defined.

B is the number of cycles after issue where the first operand of the instruction is defined. A is the number of cycles that the instruction will stay in that particular stage in the pipeline. So for simple cases, like your example, one would expect that A and B should have the same value.But there is different API for accessing to A and B.

An example of accessing to B in the source code can be found here: PPCInstrInfo::getInstrLatency. You can also look at getStageLatency in include/llvm/MC/MCInstrItineraries.h. From this two you can probably find other relevant places.

Hope this helps

Ehsan

I overrode getInstrLatency and did some printing to see what is available there. It looks like the registers are still virtual at that point when getInstrLatency is called - is that correct? (we needed to make some decisions based on actual registers that have been assigned since some registers are reserved as address space pointers and we could vary the latency based on which address space pointer register is being used - but it looks like they’re virtual there)

Phil

There are two scheduling passes. One is before register allocation and the other one is after register allocation. You probably looked at the print outs during first (pre-ra) scheduling pass. Start from TargetPassConfig::addMachinePasses to find more details about code gen passes.

I did some looking around and found this in Passes.cpp:
// Temporary option to allow experimenting with MachineScheduler as a post-RA
// scheduler. Targets can “properly” enable this with
// substitutePass(&PostRASchedulerID, &PostMachineSchedulerID); Ideally it
// wouldn’t be part of the standard pass pipeline, and the target would just add
// a PostRA scheduling pass wherever it wants.
static cl::opt MISchedPostRA(“misched-postra”, cl::Hidden,
cl::desc(“Run MachineScheduler post regalloc (independent of preRA sched)”));

So I added this to our target’s passConfig subclass:
class XSTGPassConfig : public TargetPassConfig {
public:
XSTGPassConfig(XSTGTargetMachine *TM, PassManagerBase &PM) :
TargetPassConfig(TM, PM) {
if (TM->getOptLevel() != CodeGenOpt::None)
substitutePass(&PostRASchedulerID, &PostMachineSchedulerID);
}

Then built and ran clang on some code. I had added some couts to the getInstrLatency to display the UseInstr. This is an example of what I see on the output:

DefInstr: %vreg34 = LOADI32_RI %vreg3, 268; mem:LD4%3 R32C:%vreg34 GPRC:%vreg3 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ]
Latency: 142 for: DefIdx= 0 UseIdx= 1
UseInstr: %vreg34 = LOADI32_RI %vreg3, 268; mem:LD4%3 R32C:%vreg34 GPRC:%vreg3 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ]
%vreg35 = CVT_U32_TO_U64 %vreg34; GPRC:%vreg35 R32C:%vreg34 dbg:pipeline_routing_be.c:223:3 @[ pipeline_routing_be.c:1120:1 @[ pipeline_routing_be.c:1120:1 ] ]

…since I see vreg’s mentioned there, I’m assuming this didn’t run postRA as I would have expected.

(Our code is based on LLVM 3.6 if that’s relevant)

Phil

I didn’t check 3.6. But in trunk the code to add post-ra scheduler to the pipeline is in TargetPassConfig::addMachinePasses and is guarded. I copy the related piece of code below. What you have done, does not turn on post-ra scheduler if it is not enabled.

// Second pass scheduler.
// Let Target optionally insert this pass by itself at some other
// point.
if (getOptLevel() != CodeGenOpt::None &&
!TM->targetSchedulesPostRAScheduling()) {
if (MISchedPostRA)
addPass(&PostMachineSchedulerID);
else
addPass(&PostRASchedulerID);