HazardRecognizer and RegisterAllocation

Hi list,

in our LLVM-based-project we are writing a backend for our processor. The architecture is a quite straight-forward RISC, but it does not have hardware interlocks, i.e. data hazards involving memory access must be resolved by the compiler, either by scheduling unrelated instructions or by inserting NOOPs into the load delay slots:

Hi list,

in our LLVM-based-project we are writing a backend for our processor. The
architecture is a quite straight-forward RISC, but it does not have
hardware interlocks, i.e. data hazards involving memory access must be
resolved by the compiler, either by scheduling unrelated instructions or
by inserting NOOPs into the load delay slots:

----

For example, code which looks like that:

load 0x1234, reg1
noop
add reg1, 1
load 0x1236, reg2

can be safely transformed to:

load 0x1234, reg1
load 0x1236, reg2
noop
add reg1, 1

----

It pleased us quite a lot when we found the HazardRecognizer-class.
Without much effort we could assist LLVM to transform code like shown
above (with simple (SDUse, delayCount)-map).

Unfortunately we found now out that the HazardRecognizer is used only
before register allocation, and the register allocator obviously may
reschedule instructions, but doesn't take hazard recognition into account.

The register allocator doesn't reschedule code. It can coalesce copies, commute instructions, and do re-materialization, etc. It's not clear what kind of problems you are running it.

Is there a switch/option we overlooked to bind the hazardRecognizer to the
register-allocator?

And more generally: Is the hazardRecognizer the right and only way to
solve our NOOP-minimizing problem?

Perhaps you want to do this after register allocation is done. Dan is developing the post-allocation scheduler. You can try it out.

Evan

Hi Evan,

thanks for your response.

For example, code which looks like that:

load 0x1234, reg1
noop
add reg1, 1
load 0x1236, reg2

can be safely transformed to:

load 0x1234, reg1
load 0x1236, reg2
noop
add reg1, 1

The register allocator doesn't reschedule code. It can coalesce
copies, commute instructions, and do re-materialization, etc.

Sorry for using the wrong vocabulary.

It's not clear what kind of problems you are running it.

When we limited our target to have only 3 registers available the
code post-hazard-recognized looks like that:

load 0x1234, reg1024
load 0x1236, reg1025
load 0x1238, reg1026
load 0x1240, reg1027
add reg1024, 1
add reg1025, 2
add reg1026, 3
add reg1027, 4

after register allocation:

load 0x1234, reg1
load 0x1236, reg2
load 0x1238, reg3
add reg1, 1
add reg2, 2
add reg3, 3
store reg1, 0x1234
load 0x1240, reg1
add reg1, 4

Which won't work on our platform. It is missing 2 NOOPs after the last load. The DelaySlotFiller could add the two NOOPs, but that would be less optimal than doing the store-load before the add of reg2 and reg3 (no NOOP in that case).

And more generally: Is the hazardRecognizer the right and only way to
solve our NOOP-minimizing problem?

Perhaps you want to do this after register allocation is done. Dan is
developing the post-allocation scheduler. You can try it out.

Interesting. Can it already be found SVN? I will search the mail archive later, if not.

regards,
Patrick.

Yes, it is in SVN. It's new, and so far it's only being used in limited
ways, and not anything like your specific problem. We don't currently
have any targets in-tree that require no-ops, so it may not address all
your needs out of the box, but patches are welcome :-).

Dan

Dan, how does the scheduler handle memory dependence? I'm working on
something that requires memory dependence information for
MachineInstructions.

                                               -Dave

At the moment, it knows simple things, like constant pool loads
don't have dependencies, and references to distinct stack slots are
independent, and so on.

I have a few ideas for how more precise memory dependencies might be
achieved.

We have MachineMemOperands, which can be used to make AliasAnalysis
or other LLVM-IR-level analysis queries. They need some work though;
the main issue is that there are some places in codegen that don't
preserve them.

Another possibility is to record dependence information from the
SelectionDAG in MachineInstrs somehow. We don't yet have precise
memory dependencies in the SelectionDAG, but it would be good to
fix that too :-). This would probably also involve AliasAnalysis
queries from codegen, possibly going though the
MemoryDependenceAnalysis interface.

Dan

> Dan, how does the scheduler handle memory dependence? I'm working on
> something that requires memory dependence information for
> MachineInstructions.

At the moment, it knows simple things, like constant pool loads
don't have dependencies, and references to distinct stack slots are
independent, and so on.

Ok.

I have a few ideas for how more precise memory dependencies might be
achieved.

We have MachineMemOperands, which can be used to make AliasAnalysis
or other LLVM-IR-level analysis queries. They need some work though;
the main issue is that there are some places in codegen that don't
preserve them.

Where are those places? Can they be used in conjunction with
MemoryDependenceAnalysis? e.g. can we write a MachineInstructions-based
memory dependence analysis that uses MachineMemoryOperands?

Another possibility is to record dependence information from the
SelectionDAG in MachineInstrs somehow. We don't yet have precise
memory dependencies in the SelectionDAG, but it would be good to
fix that too :-).

Agreed.

This would probably also involve AliasAnalysis queries from codegen,
possibly going though the MemoryDependenceAnalysis interface.

Do you have a vision for how this might work? Wouldn't we need a new
MachineFunctionPass to essentially do the same thing as
MemoryDependenceAnalysis?

I don't think it's sufficient to just preserve the information we had from
Instructions. Codegen might introduce new memory operations after lowering
(spilling, for example). Some of these might be easily analyzable (spills)
but others might not be.

But maybe we don't need to worry about that right now. As I think about the
problem I'm working on, "merely" preserving dependence information from
Instructions would help. It seems like if we can preserve that information in
SelectionDAG we ought to be able to record it in MachineInstructions (or
MachineMemOperands?) when lowering.

Hmm...then again looking at the MachineMemOperand documentation, fixing the
places that invalidate those might work well too. I'm a little wary of having
the same information in two different places.

                                           -Dave

Dan, how does the scheduler handle memory dependence? I'm working on
something that requires memory dependence information for
MachineInstructions.

At the moment, it knows simple things, like constant pool loads
don't have dependencies, and references to distinct stack slots are
independent, and so on.

Ok.

I have a few ideas for how more precise memory dependencies might be
achieved.

We have MachineMemOperands, which can be used to make AliasAnalysis
or other LLVM-IR-level analysis queries. They need some work though;
the main issue is that there are some places in codegen that don't
preserve them.

Where are those places?

We don't have a definite list. It's largely testable though; you
could add an assert like this:

   !(MI->mayLoad() || MI->mayStore()) || !MI->memoperands_empty()

to catch most cases. There may not be many places left at
this point.

Can they be used in conjunction with
MemoryDependenceAnalysis? e.g. can we write a MachineInstructions-based
memory dependence analysis that uses MachineMemoryOperands?

Right, the existing MemoryDependenceAnalysis works in terms of
LLVM-IR-level Instructions, but yes, it would be possible to
implement the same thing for MachineInstrs, using AliasAnalysis
queries from MachineMemOperands. As of this writing I'm not
prepared to guess whether this would be a good idea :-).

Another possibility is to record dependence information from the
SelectionDAG in MachineInstrs somehow. We don't yet have precise
memory dependencies in the SelectionDAG, but it would be good to
fix that too :-).

Agreed.

This would probably also involve AliasAnalysis queries from codegen,
possibly going though the MemoryDependenceAnalysis interface.

Do you have a vision for how this might work? Wouldn't we need a new
MachineFunctionPass to essentially do the same thing as
MemoryDependenceAnalysis?

A possible vision is that SelectionDAGBuild could use
MemoryDependenceAnalysis when building the initial SelectionDAG.
It's walking the IR-level Instructions, so it can use any
IR-level analysis. I haven't yet looked into this in detail.
A significant question is whether non-local dependencies are
important; some day we may extend SelectionDAGs to handle
more than one block at a time.

I don't think it's sufficient to just preserve the information we had from
Instructions. Codegen might introduce new memory operations after lowering
(spilling, for example). Some of these might be easily analyzable (spills)
but others might not be.

This is where PseudoSourceValues come in. There are pseudo-values
representing the stack, constants area, GOT, and other memory
locations that aren't represented at the LLVM-IR level.

But maybe we don't need to worry about that right now. As I think about the
problem I'm working on, "merely" preserving dependence information from
Instructions would help. It seems like if we can preserve that information in
SelectionDAG we ought to be able to record it in MachineInstructions (or
MachineMemOperands?) when lowering.

Hmm...then again looking at the MachineMemOperand documentation, fixing the
places that invalidate those might work well too. I'm a little wary of having
the same information in two different places.

I think the biggest challenge here is finding a design that
is reasonably efficient in terms of compile time and space while
still providing useful information.

Dan

Dan:

CellSPU could clearly benefit from the post-RA scheduler. In fact, we were thinking about writing a machine pass of our own.

One thing that does "disturb" me is that both HazardRecognizer and post-RA sched assume there's only one kind of NOP. For Cell, there are two, depending upon the pipeline being filled. Pipe 0 takes "ENOP" whereas Pipe 1 take "LNOP" (depends on the low order PC bits.)

-scooter

I'm open to suggestions and patches for making the HazardRecognizer
interface more flexible. It may also be worthwhile to look at
whether the TargetInstrInfo::insertNoop hook could be extended to
be able to meet this need.

One potential problem to watch out for is that the post-RA scheduler
currently runs before BranchFolding, which may disturb things if
you're depending on lining up the code with even/odd PC addresses.
I would prefer to run the scheduler after BranchFolding, but
BranchFolding doesn't currently preserve liveness information on
which the scheduler currently depends.

Dan

> Can they be used in conjunction with
> MemoryDependenceAnalysis? e.g. can we write a MachineInstructions-
> based
> memory dependence analysis that uses MachineMemoryOperands?

Right, the existing MemoryDependenceAnalysis works in terms of
LLVM-IR-level Instructions, but yes, it would be possible to
implement the same thing for MachineInstrs, using AliasAnalysis
queries from MachineMemOperands. As of this writing I'm not
prepared to guess whether this would be a good idea :-).

Sure. It seems fairly straightforward though. Heck, it might even be
possible to take MemoryDependenceAnalysis and turn it into generic code that
could operate on Instructions or MachineMemOperands.

>> This would probably also involve AliasAnalysis queries from codegen,
>> possibly going though the MemoryDependenceAnalysis interface.
>
> Do you have a vision for how this might work? Wouldn't we need a new
> MachineFunctionPass to essentially do the same thing as
> MemoryDependenceAnalysis?

A possible vision is that SelectionDAGBuild could use
MemoryDependenceAnalysis when building the initial SelectionDAG.
It's walking the IR-level Instructions, so it can use any
IR-level analysis. I haven't yet looked into this in detail.
A significant question is whether non-local dependencies are
important; some day we may extend SelectionDAGs to handle
more than one block at a time.

I should think non-local dependencies would be important. We have a number of
machine code passes that work globally.

> I don't think it's sufficient to just preserve the information we
> had from
> Instructions. Codegen might introduce new memory operations after
> lowering
> (spilling, for example). Some of these might be easily analyzable
> (spills)
> but others might not be.

This is where PseudoSourceValues come in. There are pseudo-values
representing the stack, constants area, GOT, and other memory
locations that aren't represented at the LLVM-IR level.

Ok, that's good. But what happens if some codegen pass deletes a memory
instruction (or a Value or whatever) and recreates it elsewhere. Presumably
the dependence information would be lost. How would we re-generate it.

I think the biggest challenge here is finding a design that
is reasonably efficient in terms of compile time and space while
still providing useful information.

Isn't that the definition of compiler development? :slight_smile:

                                       -Dave

If a pass must re-generate an instruction, it's going to have to
be responsible for keeping track of the MachineMemOperand(s) that
the re-generated instruction will need.

Also, the PseudoSourceValue objects are singletons, so if the
code generating the memory reference knows what memory it's
talking about, it can easily look up the appropriate
PseudoSourceValue for it.

Dan