- Target pass configuration: DONE
- MachineScheduler pass framework: DONE
- MI Scheduling DAG: DONE
- AliasAnalysis aware DAG option: In review (Sergei)
- Bidirectional list scheduling: DONE
- LiveInterval Update: WIP (simple instruction reordering is
- Target-independent precise modeling of register pressure: DONE
- Register pressure reduction scheduler: WIP
- Support for existing HazardChecker plugin
Is support for the existing hazard detectors working now? [it does not
say DONE or WIP here, but your comment below implies, I think, that it
is at least partially working].
Glad you're interested. I can explain. We have several important tools in LLVM that most schedulers will need. That's what I was listing below (Configurable pass, DAG, LI update, RegPressure, Itinerary, HazardChecker--normally called a reservation table).
I really should have also mentioned the DFAPacketizer developed by the Hexagon team. It's being used by their VLIW scheduler, but not by the new "standard" scheduler that I'm working on.
Now that I mentioned that, I should mention MachineInstrBundles, which was a necessary IR feature to support the VLIW scheduler, but has other random uses--sometimes we want to glue machine instructions.
HazardChecker was already being used by the PostRA scheduler before I started working on infrastructure for a new scheduler. So it's there, and can be used by custom schedulers.
My first goal was to complete all of these pieces. They're in pretty good shape now but not well tested. The target independent model for register pressure derived from arbitrary register definitions was by far the most difficult aspect. Now I need to develop a standard scheduling algorithm that will work reasonably well for any target given the register description and optionally a scheduling itinerary.
The register pressure reduction heuristic was the first that I threw into the standard scheduler because it's potentially useful by itself. It's WIP.
I haven't plugged in the HazardChecker, but it's quite straightforward.
At that point, I'll have two competing scheduling constraints and will begin implementing a framework for balancing those constraints. I'll also add fuzzy constraints such as expected latency and other cpu resources. When I get to that point, I'll explain more, and I hope you and others will follow along and help with performance analysis and heuristics.
I will point out one important aspect of the design now. If scheduling is very important for your target's performance, and you are highly confident that you model your microarchitecture effectively and have just the right heuristics, then it might make sense to give the scheduler free reign to shuffle the instructions. The standard MachineScheduler will not make that assumption. It certainly can be tuned to be as aggressive as we like, but unless there is high level of confidence that reordering instructions will be beneficial, we don't want to do it. Rule #1 is not to follow blind heuristics that reschedule reasonable code into something pathologically bad. This notion of confidence is not something schedulers typically have, and is fundamental to the design.
For example, most schedulers have to deal with opposing constraints of register pressure and ILP. An aggressive way to deal with this is by running two separate scheduling passes. First top-down to find the optimal latency, then bottom-up to minimize resources needed to achieve that latency. Naturally, after the first pass, you've shuffled instructions beyond all recognition. Instead, we deal with this problem by scheduling in both directions simultaneously. At each point, we know which resources and constraints are likely to impact the cpu pipeline in both the top and bottom of the scheduling region. Doing this doesn't solve any fundamental problem, but it gives the scheduler great freedom at each point, including the freedom to do absolutely nothing, which is probably exactly what you want for a fair amount of X86 code.
- New target description feature: buffered resources
- Modeling min latency, expected latency, and resources constraints
Can you comment on how min and expected latency will be used in the
In the new scheduler's terminology, min latency is an interlocked resource, and expected latency is a buffered resource. Interlocked resources are used to form instruction groups (for performance only, not correctness). For out-of-order targets with register rename, we can use zero-cycle min latency so there is no interlock within an issue groups. Instead we know expected latency of the scheduled instructions relative to the critical path. We can balance the schedule so that neither the expected latency of the top nor bottom scheduled instructions exceed the overall critical path. This way, we will slice up two very long independent chains into neat chunks, instead of the random shuffling that we do today.
- Heuristics that balance interlocks, regpressure, latency and
For targets where scheduling is critical, I encourage developers who
stay in sync with trunk to write their own target-specific scheduler
based on the pieces that are already available. Hexagon developers
are doing this now. The LLVM toolkit for scheduling is all there--not
perfect, but ready for developers.
- Pluggable MachineScheduler pass
- Scheduling DAG
- LiveInterval Update
- RegisterPressure tracking
- InstructionItinerary and HazardChecker (to be extended)
If you would simply like improved X86 scheduling without rolling your
own, then providing feedback and test cases is useful so we can
incorporate improvements into the standard scheduler while it's being
Does this mean that we're going to see a new X86 scheduling paradigm,
or is the existing ILP heuristic, in large part, expected to stay?
It's a new paradigm but not a change in focus--we're not modeling the microarchitecture in any greater detail. Although other contributors are encouraged to do that.
Both schedulers will be supported for a time. In fact it will make sense to run both in the same compile, until MISched is good enough to take over. It will be easy to determine when one scheduler is doing better than the other. I'm relying on you to tell me when it's doing the wrong thing.