[RFC][Scheduling] Insert NOPs &&dual issue MIs based on the new MIScheduler

Hi Yanjun,

*First*, I thought I could rely on the HazardRecoginer to help me decide
when I should insert the NOPs. But we
want to use the new MachineSchedModel instead of the legacy Itinerary based
model. That means we
won't define Itinerary model in our backend. Does that mean I cannot use
the HazardRecoginzer llvm
frame code to help me decide when to insert NOPs? i.e: I need people to
help me confirm: whether it
is possible to use the LLVM HazardRecoginzer to help deciding when to
insert NOPs without defining
Itinerary info?

I'm assuming you're referring to the ScoreBoard-based implementation of the
recognizer (which is the only recognizer implementation available in the
official trunck of LLVM).

The brief answer is (as far as I know): no, you cannot use the
HazardRecognizer without an Itinerary-based scheduling model.

However, note that the HazardRecognizer
(include/llvm/HazardRecognizer.h) provides
only an API, which is agnostic on what sched model is employed; but it
happens that the backend, up to know, is populated only with recognizers
on itineraries.

Hexagon does define *Itinerary *in their
MachineSchedModel. This confuses me,
does this mean the new MachineSchedModel can work together with the old
Itinerary model?

Yes, the new MachineScheduler is able to manage scheduling driven both by
Itinerary-basedmodel and Per-Operand model. Indeed, when itineraries are
provided, the (Post)GenericScheduler strategy is initialized with a
Scoreboard-based HazardRecognizer, skipping any internal scheduling

Is that the
best way to do scheduling modeling? I was assuming the community now
recommend people to use the
new MIScheduler + new MachineScheModel for new target development.

I think that the assumption is correct. Indeed, for instance, recently added
in-order targets (e.g., ARM R52 and ARM M4) employ solely a Per-Operand

there is also a stand-alone
postRA hazard recognizer which runs the hazard recoginzer and emits noops
when necessary. It gives targets
a way to run the hazard recognizer without running one of the schedulers.
Should I explore that one instead?

Actually, this pass creates a HazardRecognizer via:

(1) "CreateTargetPostRAHazardRecognizer(const MachineFunction &)"

which istarget-dependent.

For instance, ARM backend does not override (1), thus won't create a
valid recognizer,
early exiting the postRA hazard recognizer.

As said before, the HazardRecognizer is merely an interface abstracting
on the specific
scheduling model provided: if you want to use an hazard recognizer not
based on itineraries,
you have to specialize such interface.

*Second*, about dual issue. How does MIScheduler handle dual issue vs
single issue? The only thing that I found
was *IssueWidth*´╝îwhich should be set to 2 if dual issue, and 1 if single

The (Post)GenericScheduler strategy tracks 5 main informations:

1. CurrCycle: the current cycle where the scheduling decision is taken

2. CurrMOps: number of micro-ops issued in CurrCycle (usually,
instructions are modeled as
1 mops, but is target-dependent).

3. Reservation of Resource Units: for how many cycles each units is
unavailable (i.e., already
processing an instruction).

4. PendingQueue: the nodes which cannot be issued due to some hazards.

5. AvailableQueue: the nodes ready to be issued.

The strategy checks if it is possible to issue an instruction, according
to (2), (3), (4)
and (5). Concerning the dual-issuing, this is possible only if there's
an instruction (MI) in (5),
such that (2) + MOps(MI) <= IssueLimit, and there's a Resource unit
ready to consume MI.

As long as such conditions are satisfied, dual-issuing is possible.

what should I do to insert NOPs? I also looked ARM
backend, it doesn't override the
insertNoop() function, does that mean ARM hardware can automatically stall
when NOP is needed?

I haven't work with the Scoreboard-based hazard recognizer; hence, I cannot
tell you how noops are handled in such case. However, in the
explicit noop insertion is not required, for the scheduler using (1):
when no instruction is
issuable, the pipeline is assumed to be stalled, and (1) bumped until
(5) is filled with
an instruction, and scheduling resumed.

Any comments and suggestions will be greatly appreciated, thanks!

The scheduling model for the ARM Cortex-R52, an in-order dual issue
target, provides a really clean
and complete per-operand model. I'd really suggest you to look at it :).


-- Lorenzo

Thank you so much Lorenzo, your detailed explanation is very helpful! Really appreciate it.