how to detect data hazard in pre-RA-sched

hi, LLVM,

I found there is a flag DisableHazardRecognizer in TargetInstrImpl.cpp. I still don’t understand how llvm detects data hazard in pre-RA-sched. pre-RA-sched is based on SDNode and all operands are vregs. Even you can calculate the operators of SDNodes, the data hazard in vreg are not same as physical register data hazard. Is it useful to optimize processor pipeline?

thanks,
–lx

The hazard recognizer enforces the instruction itineraries that are defined for some subtargets. The itineraries specify resource usage at each pipeline stage and latency. The "hazards" being recognized are resource conflicts, like two independent instructions using the FP unit, or read after write latency. It does not deal with WAR physical register hazards.

(Targets are migrating to a more flexible and efficient machine model now that does not use the hazard recognizer.)

-Andy

Hi, Andrew,

Thank you for answering my question.

What’s the status of misched? is it experimental? I found it is disabled by default for all architectures(3.4svn). I also don’t understand the algorithm. Could you point to me more papers or text materials about your approach? it seems that you want to balance register pressure and ILP in misched.

Hi, Andrew,

Thank you for answering my question.

What’s the status of misched? is it experimental? I found it is disabled by default for all architectures(3.4svn). I also don’t understand the algorithm. Could you point to me more papers or text materials about your approach? it seems that you want to balance register pressure and ILP in misched.

It has been used in production for a year. It’s currently enabled on trunk for PPC, R600, and Hexagon. If there are no objections I’d like to move x86 and armv7 ASAP. Leaving it disabled is becoming more of a maintenance burden.

Please see my llvm-dev list messages to Ghassan yesterday. MI Scheduler is pass that just provides a place to do scheduling and a large toolbox to do it with. ScheduleDAGMI is a list scheduler driver, and the GenericScheduler strategy attempts to balance register pressure with latency. In my opinion getting the right register pressure vs latency balance is easy to do at a given point in time for a small benchmark suite, but very, very hard to do in general with a design that works across microarchitectures and is resilient to changes to incoming IR. GenericScheduler doesn’t magically solve this problem, but it should never do anything too terrible either.

The old itineraries allow specifying which resources are used in each pipeline stage. It’s a full matrix.

In the new machine model, you only specify the resources and number of cycles. It can be implemented with simple counters. This works in practice because it’s almost always the case that different instructions begin using a given resource at the same time relative to when the instruction is executed. Even the VLIW implementation I’ve seen in trunk could have used the new model.

It’s efficient because the scheduler doesn’t need to manage a reservation table or build a state machine.

It’s more flexible because predicates allow instructions to be modeled differently based on opcode extensions or immediate values.

The postRA hazard that your talking about is the job of the dependence graph builder. That is the same for both post-RA and MI sched. When the DAG builder runs before regalloc, it also has to handle virtual registers, that’s the only difference.

The best way for me to explain how to define a machine model for an in-order processor would be to work with someone who is ready to migrate mips or a simple ppc, arm, or x86 (atom) implementation and improve the docs along the way.

We’re also lacking a model for AVX!

-Andy

Hi, Andrew,

Thank you for answering my question.

What's the status of misched? is it experimental? I found it is disabled
by default for all architectures(3.4svn). I also don't understand the
algorithm. Could you point to me more papers or text materials about your
approach? it seems that you want to balance register pressure and ILP in
misched.

It has been used in production for a year. It’s currently enabled on trunk
for PPC, R600, and Hexagon. If there are no objections I’d like to move x86
and armv7 ASAP. Leaving it disabled is becoming more of a maintenance
burden.

Please see my llvm-dev list messages to Ghassan yesterday. MI Scheduler is
pass that just provides a place to do scheduling and a large toolbox to do
it with. ScheduleDAGMI is a list scheduler driver, and the GenericScheduler
strategy attempts to balance register pressure with latency. In my opinion
getting the right register pressure vs latency balance is easy to do at a
given point in time for a small benchmark suite, but very, very hard to do
in general with a design that works across microarchitectures and is
resilient to changes to incoming IR. GenericScheduler doesn’t magically
solve this problem, but it should never do anything too terrible either.

Sorry, I have a false statement above. I tried x86/arm/mips and found no

misched in use. you means misched is just like a framework. Backend can
configure TargetConfigaPass to run misched both pre-RA and post-RA, right?

> hi, LLVM,
>
> I found there is a flag DisableHazardRecognizer in TargetInstrImpl.cpp.
I still don't understand how llvm detects data hazard in pre-RA-sched.
pre-RA-sched is based on SDNode and all operands are vregs. Even you can
calculate the operators of SDNodes, the data hazard in vreg are not same as
physical register data hazard. Is it useful to optimize processor pipeline?

The hazard recognizer enforces the instruction itineraries that are
defined for some subtargets. The itineraries specify resource usage at each
pipeline stage and latency. The "hazards" being recognized are resource
conflicts, like two independent instructions using the FP unit, or read
after write latency. It does not deal with WAR physical register hazards.

(Targets are migrating to a more flexible and efficient machine model now
that does not use the hazard recognizer.)

I don't understand this statement. what's the meaning of "more flexible &

efficient machine model". I know intel x86 processors are featured with
aggressive out of order function, but arm and mips don't have it. Server
processor can have, embedded processor will not. Compiler writers still
need to consider instruction pipeline and multiple issue.

Our processor still uses mips-like multiple-stage pipeline, almost same as
what textbook taught me. We suffer from pipeline stalls and manager to
improve issue rate using instruction scheduling. by now, I use
post-RA-sched because It can build graph whose edges are dependencies. the
dependencies are real basing on physical register and instruction
attributes. Because misched happens before register allocation, I don't
think I can make use of it to resolve data hazard. am I right?

The old itineraries allow specifying which resources are used in each
pipeline stage. It’s a full matrix.

In the new machine model, you only specify the resources and number of
cycles. It can be implemented with simple counters. This works in practice
because it’s almost always the case that different instructions begin using
a given resource at the same time relative to when the instruction is
executed. Even the VLIW implementation I’ve seen in trunk could have used
the new model.

It’s efficient because the scheduler doesn’t need to manage a reservation
table or build a state machine.

It’s more flexible because predicates allow instructions to be modeled
differently based on opcode extensions or immediate values.

I got it. I do feel the scoreboard state-tracker is not that useful except

for software pipelining. I have resolved my instruction pipeline stall
using existing post-RA-Sched(TD). I will investigate the new misched
approach later. Ghassan mentioned llvm performance regression in previous
thread. Do you measure perf impact of compiler using llvm-test-suite ? As
you said, there's no absolutely one good algorithm for instruction
scheduler , so I have to learn measure my changes.

The postRA hazard that your talking about is the job of the dependence
graph builder. That is the same for both post-RA and MI sched. When the DAG
builder runs before regalloc, it also has to handle virtual registers,
that’s the only difference.

The best way for me to explain how to define a machine model for an
in-order processor would be to work with someone who is ready to migrate
mips or a simple ppc, arm, or x86 (atom) implementation and improve the
docs along the way.

We’re also lacking a model for AVX!

-Andy

thanks,
--lx

It’s currently only setup to run pre-RA. I’d like to set it up for post-RA also. I don’t expect that to be much work.

Backends can configure MI scheduler differently depending on how much control they want. The easiest thing to do is define bool SubTargetInfo::enableMachineScheduler() const { return true; }

Sorry, did you mention which target you’re developing?

Yes, I use llvm-suite suite, but don’t tune for it at all. It’s just a way to find bugs. We have a few other benchmark suites of course. I really encourage people to run their own benchmarks on their own hardware. Of course, if you see a problem, it’s good to run -debug-only=misched to find out what happened. It could be a simple bug or configuration error.

-Andy

Hi, Andrew,

Thank you for answering my question.

What's the status of misched? is it experimental? I found it is disabled
by default for all architectures(3.4svn). I also don't understand the
algorithm. Could you point to me more papers or text materials about your
approach? it seems that you want to balance register pressure and ILP in
misched.

It has been used in production for a year. It’s currently enabled on
trunk for PPC, R600, and Hexagon. If there are no objections I’d like to
move x86 and armv7 ASAP. Leaving it disabled is becoming more of a
maintenance burden.

Please see my llvm-dev list messages to Ghassan yesterday. MI Scheduler
is pass that just provides a place to do scheduling and a large toolbox to
do it with. ScheduleDAGMI is a list scheduler driver, and the
GenericScheduler strategy attempts to balance register pressure with
latency. In my opinion getting the right register pressure vs latency
balance is easy to do at a given point in time for a small benchmark suite,
but very, very hard to do in general with a design that works across
microarchitectures and is resilient to changes to incoming IR.
GenericScheduler doesn’t magically solve this problem, but it should never
do anything too terrible either.

Sorry, I have a false statement above. I tried x86/arm/mips and found no

misched in use. you means misched is just like a framework. Backend can
configure TargetConfigaPass to run misched both pre-RA and post-RA, right?

It's currently only setup to run pre-RA. I'd like to set it up for post-RA
also. I don't expect that to be much work.

Backends can configure MI scheduler differently depending on how much
control they want. The easiest thing to do is define bool
<My>SubTargetInfo::enableMachineScheduler() const { return true; }

Sorry, did you mention which target you're developing?

We develop a backend for our in-house processor. Currently, we are working
on llvm 3.2 release. I also evaluate arm and mips for comparison. I will
tweak misched in your pointers.

Thank you for you insightful help!

--lx