Scheduling with RAW hazards

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I'm also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

Thanks,
Fraser

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG’s edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.
DEBUG(dbgs() << “\n*** Scheduling [” << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won’t insert nops for you. You’d need to do that in a target-specific way.

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I’m also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

The hazard recognizer won’t help you. It only enforces pipeline hazards (other instructions that need FuncU). It’s the list scheduler itself that “enforces” operand latency.

MI scheduler allows you to use a new machine model that’s simpler for most people who don’t need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between–provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {…}

namespace llvm {
ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {
ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());
DAG->addMutation(new MyDAGMutation());
return DAG;
}
} // namespace llvm

static MachineSchedRegistry
MySchedRegistry(“mysched”, “Custom My scheduler.”, createMySched);

-Andy

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG’s edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.
DEBUG(dbgs() << “\n*** Scheduling [” << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won’t insert nops for you. You’d need to do that in a target-specific way.

Yes, I see the two-cycle difference between the two instructions. I enabled the post-RA scheduler, and noticed that it cared about the latencies, and started to rearrange the instructions accordingly. Is it necessary to use the post-RA scheduler to enforce such latencies?

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I’m also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

The hazard recognizer won’t help you. It only enforces pipeline hazards (other instructions that need FuncU). It’s the list scheduler itself that “enforces” operand latency.

Ah okay, thank you.

MI scheduler allows you to use a new machine model that’s simpler for most people who don’t need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between–provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {…}

namespace llvm {
ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {
ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());
DAG->addMutation(new MyDAGMutation());
return DAG;
}
} // namespace llvm

static MachineSchedRegistry
MySchedRegistry(“mysched”, “Custom My scheduler.”, createMySched);

-Andy

I’ve had a quick experiment with the MI Scheduler, and have a few further questions. From what I can see, if I pass -enable-misched to the compiler, it only works above O1, though addOptimizedRegAlloc(). Is O0 not supported without adding the pass myself in my PassConfig?

How does (or will) the MI Scheduler interact with the existing SD Scheduler? It seems as though they both run together at the moment.

Thanks,
Fraser

I have an instruction that takes no operands, and produces two results, in two consecutive cycles.

I tried both of the following to my Schedule.td file:

InstrItinData<IIMyInstr, [InstrStage<2, [FuncU]>], [1, 2]>,
InstrItinData<IIMyInstr, [InstrStage<1, [FuncU]>, InstrStage<1, [FuncU]>], [1, 2]>,

From what I can see in examples, these say that the first operand is ready the cycle after issue, and the second is ready 2 cycles after issue.

Yes, they look equivalent.

But when I issue an instruction that uses both results, it does not obey this hazard, and is issued the cycle immediately after. Are there any target hooks I need to implement to get this scheduling correctly?

Look at -debug-only=pre-RA-sched and confirm that the DAG’s edges have the correct latency.

It also prints the current cycle count each time it schedules an instruction.
DEBUG(dbgs() << “\n*** Scheduling [” << CurCycle << "]: ");

You should see a two cycle difference between MyInstr and its second dependent. The scheduler won’t insert nops for you. You’d need to do that in a target-specific way.

Yes, I see the two-cycle difference between the two instructions. I enabled the post-RA scheduler, and noticed that it cared about the latencies, and started to rearrange the instructions accordingly. Is it necessary to use the post-RA scheduler to enforce such latencies?

I noticed that my target was using the default HazardRecognizer, which is effectively disabled, so I changed it to use the ScoreboardHazardRecognizer instead. I’m also still using the SelectionDAG scheduler, but will need to change to the MI scheduler at some point, to keep up with trunk. Should either of these help?

SD scheduler has several heuristics that can defeat each other. Without debugging, I can’t say what the problem is. PostRA scheduler was originally meant for targets where precise latency matters, so that’s probably a better fit. Hopefully we can make MachineScheduler work for you in the long run.

The hazard recognizer won’t help you. It only enforces pipeline hazards (other instructions that need FuncU). It’s the list scheduler itself that “enforces” operand latency.

Ah okay, thank you.

MI scheduler allows you to use a new machine model that’s simpler for most people who don’t need the precision of Itineraries. Maybe not important in your case.

More importantly, SDScheduler is take-it-as-is, and will go away entirely after 3.3. Whereas MI scheduler can be fixed and improved. Now would be a good time to try switching over and start filing bugs. PPC is an example of using MI scheduler out-of-box. Hexagon is an example of customizing it at a high level. You could start off like PPC with minimal customization, but eventually you may want something in between–provide a custom MachineSchedStrategy:

class MyScheduler : public MachineSchedStrategy {…}

namespace llvm {
ScheduleDAGInstrs *createMySched(MachineSchedContext *C) {
ScheduleDAGMI *DAG = new ScheduleDAGMI(C, new MyScheduler());
DAG->addMutation(new MyDAGMutation());
return DAG;
}
} // namespace llvm

static MachineSchedRegistry
MySchedRegistry(“mysched”, “Custom My scheduler.”, createMySched);

-Andy

I’ve had a quick experiment with the MI Scheduler, and have a few further questions. From what I can see, if I pass -enable-misched to the compiler, it only works above O1, though addOptimizedRegAlloc(). Is O0 not supported without adding the pass myself in my PassConfig?

MachineScheduler is integrated with the regalloc pipeline because it uses and updates LiveIntervals. -O0 does not compute LiveIntervals.

You could enable MachineScheduler using PassConfig. It should just compute LIS on demand in that case.

How does (or will) the MI Scheduler interact with the existing SD Scheduler? It seems as though they both run together at the moment.

Thanks,
Fraser

Good question. There’s no point in running SD Scheduler when MachineScheduler is enabled. But there’s no way to disable it, other than -pre-RA-sched=source. You can automatically get all the options you need using this hook:

virtual bool MySubtargetInfo::enableMachineScheduler() const { return true; }

-Andy