Best way to globally schedule MachineInstrs

Hello,

It's probably not the first time this question is asked, but I got no luck with my Google searches and tours of the LLVM code-base.

I would like to, in the safest, most correct and most general way possible, schedule MachineInstrs across basic blocks. The idea is that in an in-order issue but out-of-order retire machine (common in many open-source FPGA-based microprocessors, that are in-order but have several execution units), it is important to consume the result of instructions as late as possible. For instance, "1+(1/x)" should issue the division as early as possible, but wait as much as possible before having to add 1 to the result of the division.

All the scheduling approaches that I've found in LLVM seem to work on basic blocks. More precisely, they work on scheduling regions, that are sub-portions of basic blocks. I have found the LLVM bitcode Sink pass that moves instructions down to later basic blocks, and the MachineSink pass, that does the same but for MachineInstrs. They address my problem, as long as I mark "fdiv" to be not-sinkable in TargetInstrInfo::shouldSink. However, they seem to sink sinkable instructions as much as possible, without much reasoning about how far they should be sunk, what the impact on register pressure it would have, etc. Am I correct? I also find it difficult to maintain to have some scheduling info in .td files, some in TargetSubtarget::adjustSchedDependency, and some in TargetInstrInfo::shouldSink.

So, is there something that I missed somewhere, and that could allow me to, preferably in one place, describe how various long-dependency instructions should be scheduled across basic blocks? Things like "fsin should be emitted as early as possible" and "consumers of sink can wait after <that whole loop> before being emitted"?

Best regards,
Denis

Hi,

Hello,

It's probably not the first time this question is asked, but I got no luck with my Google searches and tours of the LLVM code-base.

I would like to, in the safest, most correct and most general way possible, schedule MachineInstrs across basic blocks. The idea is that in an in-order issue but out-of-order retire machine (common in many open-source FPGA-based microprocessors, that are in-order but have several execution units), it is important to consume the result of instructions as late as possible. For instance, "1+(1/x)" should issue the division as early as possible, but wait as much as possible before having to add 1 to the result of the division.

All the scheduling approaches that I've found in LLVM seem to work on basic blocks. More precisely, they work on scheduling regions, that are sub-portions of basic blocks. I have found the LLVM bitcode Sink pass that moves instructions down to later basic blocks, and the MachineSink pass, that does the same but for MachineInstrs. They address my problem, as long as I mark "fdiv" to be not-sinkable in TargetInstrInfo::shouldSink. However, they seem to sink sinkable instructions as much as possible, without much reasoning about how far they should be sunk, what the impact on register pressure it would have, etc. Am I correct?

Yes, scheduling currently only works in sub-portions of basic blocks and it looks like the sinking passes try to sink any instruction they can to their successors, mostly independent of the impact on latency/resource usage and register pressure.

I also find it difficult to maintain to have some scheduling info in .td files, some in TargetSubtarget::adjustSchedDependency, and some in TargetInstrInfo::shouldSink.

The .td files contain static information about the instructions available for a target (latency, resource usage). Additional hooks like adjustSchedDependency and shouldSink allow making decision/adjustments based on a MachineInstr/SUnit. There you have access to the containing block, concrete operands and more, so you can make decisions based on more information than by defining things in the .td files.

So, is there something that I missed somewhere, and that could allow me to, preferably in one place, describe how various long-dependency instructions should be scheduled across basic blocks? Things like "fsin should be emitted as early as possible" and "consumers of sink can wait after <that whole loop> before being emitted"?

Extending the scheduler to work across scheduling boundaries is probably a relatively big project and if you are mostly interested in adjusting the location of a few instructions, adding better cost modeling to the sinking passes might be a good first step.

For that, MachineTraceMetrics might be helpful (https://llvm.org/doxygen/classllvm_1_1MachineTraceMetrics.html). It generates traces representing plausible sequences of executed basic blocks passing through a given block and computes resource usage/latencies through the trace. See the MachineCombiner as an example user. To estimate register pressure through the trace, RegPressureTracker (https://llvm.org/doxygen/classllvm_1_1RegPressureTracker.html) may be helpful.

Cheers,
Florian