Pluggable Register Coalescers

Ok, I'm at a point now where I can implement plyggable register coalescers as
we originally wanted when I started the refactoring work.

I'm in the midst of finishing up an implementation following the model of how
register allocators are selected. Thgere may be some more refactoring/code
sharing work to be done with this, but I want to get it working first.

createRegisterAllocator() is called by addPassesToEmit.* to instantiate the
register allocator.

Coalescing is a bit different. Typically, the register allocator is the
client of the coalescer. My original thought was to have each register
allocator implementation call createRegisterCoalescer directly. I then
thought it might be better to have allocators just declare they want a
coalescer run and let someone else decide how to do it, similarly to
how the alias analysis group works.

But that's not quite right either, because it isn't possible in general to
state that coalescing should run "before" register allocation. Often you
want to run it in the middle of the algorithm, and maybe more than once.

So now I'm back to the original thought of just having the allocators
call createRegisterCoalescer directly. Does this sound like the right
solution? Are there other ideas?

Are there times one might run register coalescing completely outside of
register allocation? Again, in general, not all coalescers will be able to do
that (the classic case being a coalescer that works off an interference
graph).

                                            -Dave

Hi David,

Ok, I'm at a point now where I can implement plyggable register coalescers as
we originally wanted when I started the refactoring work.

I'm in the midst of finishing up an implementation following the model of how
register allocators are selected. Thgere may be some more refactoring/code
sharing work to be done with this, but I want to get it working first.

createRegisterAllocator() is called by addPassesToEmit.* to instantiate the
register allocator.

Coalescing is a bit different. Typically, the register allocator is the
client of the coalescer. My original thought was to have each register
allocator implementation call createRegisterCoalescer directly. I then
thought it might be better to have allocators just declare they want a
coalescer run and let someone else decide how to do it, similarly to
how the alias analysis group works.

But that's not quite right either, because it isn't possible in general to
state that coalescing should run "before" register allocation. Often you
want to run it in the middle of the algorithm, and maybe more than once.

So now I'm back to the original thought of just having the allocators
call createRegisterCoalescer directly. Does this sound like the right
solution? Are there other ideas?

The only thing that comes to mind is that creating and running the
coalescer are separate operations so you might want to do the creation
of it in alias analysis style. Then, the allocator can a) determine if a
coalescer was created, b) obtain the coalescer that was created, if any,
and c) run it at the right time for the allocator's algorithm, or even
not at all.

Ok, that sounds like a good plan. I'll brush up on how alias analysis
works and work from there.

                                            -Dave

I've been looking at this and it's become clear to me that we need some kind
of abstract coalescing interface similar to what the AliasAnalysis class
provides. I think we need the same thing for register allocators.

LLVM's existing coalescer is very simple. It just goes and replaces every
copy it can. In general, a coalescer will want to query the register
allocator about which copies it should remove. For example, Briggs'
conservative coalescer will not remove copies that could potentially cause
spills. Only the register allocator has the information to make that
decision.

A coalescing interface needs to provide hooks for the register allocator to
invoke it potentially multiple times with a potentially restricted scope. The
register allocator needs to provide an interface for the coalescer to ask
questions like, "is this copy here to split live ranges?" and, "will
coalescing this copy potentially cause a spill?"

AI think all this will require is new RegisterCoalescer and RegisterAllocator
interface classes that can be multiply-inherited (with MachineFunctionPass)
by the various implementations.

It's not totally clear to me whether coalescers should even inherit from
MachineFunctionPass as they are really invoked by the register allocator,
not PassManager. But then again, I can imagine some coalescers (like the
existing one) being independent enough to run on their own at any time.
Opinions on this are welcome.

Does this sound like a reasonable plan? I want to keep the infrastructure as
flexible as possible, to allow multiple implementations that can be chosen at
run-time. But the interfaces need to be specific enough to be useful for more
powerful algorithms.

Discuss away!

                                                  -Dave

Ok, that sounds like a good plan. I'll brush up on how alias analysis
works and work from there.

I've been looking at this and it's become clear to me that we need some kind
of abstract coalescing interface similar to what the AliasAnalysis class
provides. I think we need the same thing for register allocators.

LLVM's existing coalescer is very simple. It just goes and replaces every
copy it can. In general, a coalescer will want to query the register
allocator about which copies it should remove. For example, Briggs'
conservative coalescer will not remove copies that could potentially cause
spills. Only the register allocator has the information to make that
decision.
A coalescing interface needs to provide hooks for the register allocator to
invoke it potentially multiple times with a potentially restricted scope. The
register allocator needs to provide an interface for the coalescer to ask
questions like, "is this copy here to split live ranges?" and, "will
coalescing this copy potentially cause a spill?"

AI think all this will require is new RegisterCoalescer and RegisterAllocator
interface classes that can be multiply-inherited (with MachineFunctionPass)
by the various implementations.

It's not totally clear to me whether coalescers should even inherit from
MachineFunctionPass as they are really invoked by the register allocator,
not PassManager. But then again, I can imagine some coalescers (like the
existing one) being independent enough to run on their own at any time.
Opinions on this are welcome.

I think the coalescer should be flexible enough to be run independent of the register allocator. For example, you may want to expose the copies induced by transforming out of SSA to the scheduler. If the scheduler is run before register allocation, then you would want the coalescing to happen before as well.

-Tanya

Hi all,

Tanya and Reid's ideas are very good. I think we want to go towards pluggable register coalescers. That said, I don't think the longer term plan to stop short term progress. To me, it's totally acceptable for the allocator to directly call the coalescers at this point. I would love to see a smarter coalescer. That's something we can use today! :slight_smile:

Thanks,

Evan

I agree that we want to allow for this possibility, but there are some
coalescers that cannot run independent of the register allocator.
So how do we handle both? If I as a user ask for a regalloc-dependent
coalescer, what should it do if someone tries to run it as an independent
pass?

It could provide a fallback, perhaps invoking the existing coalescer,
but that would cause coalescing the user might not want.

It could do nothing and count on the register allocator to run it at some
point, but not all register allocators will necessarily run a coalescing pass.

I agree with Evan that we don't want to strangle short-term gain, but these
are the kinds of questions we should keep in mind as we design interfaces.
Interfaces can always change later, of course, but it's good to get them as
right as possible initially.

                                                  -Dave

Could it be possible for there to be a harness type interface that would allow coalescers that support both modes to be hooked into the pass registration, and those that depend on the allocator not be registered as passes?

I have a patch for this kind of thing attached. Please take a look and let
me know if it looks reasonable. It's not complete yet. I'm about to embark
on implementing a new coalescer that will not be managed by PassManager
and as I do that I'll figure out any additional interfaces that are needed.

If this looks good, I'll commit it and later on I'll commit any additions I
make to this basic interface.

                                                  -Dave

coalscing_patch.txt (15.4 KB)

No comments? I'm continuing to implement a new coalescer and trying to
figure out how to factor out the code to merge live intervals from the current
implementation. It's hard because that code is also doing legality checks.

Does the current implementation follow a published algorithm? If so, do we
have a pointer to it? The comments are rather sparse and it's difficult to
figure out what's going on at times.

                                                   -Dave

Hi David,

Sorry I should have replied earlier. I really don't like this dual interface approach. To me, this muddles things without offering any real useful new functionalities.

IMHO, if a register coalescer is tied to a particular allocator. Then either it should simply belong to that allocator or that we have to allow the allocator to act as a pass manager itself, i.e. it can control what passes to run as part the allocating process. I don't know if the pass manager allows that functionality. If not, then that's something we should add because there are other passes that can benefit from it.

Also, on this particular file below. Seems to me these queries do not belong to the register allocator. Perhaps a utility class that tracks interference information that can be accessed by both the allocator and the coalescer? Also, why is coalesceThisCopy() needed? Shouldn't the coalescer be responsible for making the decision?

Evan

Index: llvm/include/llvm/CodeGen/RegisterAllocator.h

Could it be possible for there to be a harness type interface that
would allow coalescers that support both modes to be hooked into the
pass registration, and those that depend on the allocator not be
registered as passes?

I have a patch for this kind of thing attached. Please take a look and let
me know if it looks reasonable.

No comments? I'm continuing to implement a new coalescer and trying to
figure out how to factor out the code to merge live intervals from the current
implementation. It's hard because that code is also doing legality checks.

Commented.

Does the current implementation follow a published algorithm? If so, do we
have a pointer to it? The comments are rather sparse and it's difficult to
figure out what's going on at times.

I don't know. I suspect not. I agree the current implementation is a bit of a challenge to understand and modify.

Evan

Its a property change. In Subversion every file and directory can have
properties associated with them. For example the svn:ignore property on
a directory tells svn to ignore certain non-versioned files. Simlarly
svn:execute tells svn that the file is executable and so should have
execute permissions on it when its checked out.

This particular one is saying that the end-of-line style for this file
should be
just LF rather than CR-LF. Our standard is the Unix standard, LF and
that is also the subversion default. This property set is superfluous
but harmless.

Reid.

Sorry I should have replied earlier. I really don't like this dual
interface approach. To me, this muddles things without offering any
real useful new functionalities.

Ok. See below for the rationale.

IMHO, if a register coalescer is tied to a particular allocator. Then
either it should simply belong to that allocator or that we have to
allow the allocator to act as a pass manager itself, i.e. it can
control what passes to run as part the allocating process. I don't
know if the pass manager allows that functionality. If not, then
that's something we should add because there are other passes that
can benefit from it.

I'm not sure I fully understand what you're proposing. The reason I
went with the AliasAnalysis-like construct is that I envisioned a
"-coalescer=<type>" user option, similar to how alias analysis and
register allocation works. Somehow we have to make that option
work with independent register coalescers (those derived from
MachineFunctionPass or equivalents that can run through the
PassManager) and those that are tied to some abstract register
allocation algorithm. A coalescer that runs in conjunction with an
allocator need not be tied to a particular implementation of register
allocation. For example, George and Appel's Iterated Register
Coalescing can work with any concrete implementation of graph
coloring (and there are lots of variations out there) and can probably
work with other algorithms as well. So if we can avoid it, I'd rather not
be limited to hard-coding these things.

These two requirements led to the abstract RegisterCoalescer
interface in the patch. This is the interface that register allocators
know about. Likewise, coalescers need an abstract interface to
register allocators to ask questions and do other things.

If we decide that we don't need this flexibility, that's fine with me. I was
working based on the needs expressed by Tanya and Reid and the
implementation suggested by Christopher Lamb. I'm not wedded to it.

To me, the most important thing is a simple user interface. We want one
command-line option that picks a coalescer and the user should not have
to worry about whether the coalescer is implemented as a separate pass
or works as part of the register allocator. The user should not have to worry
about whether the coalescer and register allocator he or she specifies
are compatable in the sense that the register allocator knows the guts of
the coalescer and vice-versa. That seems to drive the need for abstract
interfaces for both.

Also, on this particular file below. Seems to me these queries do not
belong to the register allocator. Perhaps a utility class that tracks
interference information that can be accessed by both the allocator
and the coalescer? Also, why is coalesceThisCopy() needed? Shouldn't
the coalescer be responsible for making the decision?

Register allocators know information about interference, for example,
the interference graph in a coloring allocator. The classic coalescing
algorithm examines the interference graph and looks for copies between
allocation units whose corresponding nodes do not have an edge between
them in the graph. That's why that interface is there.

A utility class is an interesting idea. In fact I've been thinking about how
one might go about separating the essentials of, say, a graph coloring
allocator and the particular heuristics it uses to make decisions. A graph
is a graph and could certainly be factored out into a separate utility class,
which could provide interfaces for queries. A policy-based design is one
possibility, where a generic graph coloring algorithm is written as a template
and the developer passes policy types that it uses to configure itself with
heuristics.

The larger question is whether a good abstract interface for the utility class
can be found that can support difference representations such as graphs,
the linear-scan stuff, etc. It seems hard to me. I'm not even sure a
coalescer that wants to work with an allocator can work with every register
allocation algorithm. For example, can the linear-scan algorithm satisfy
the queries asked by a coalescer that expects a graph coloring algorithm?
I don't know. It would be nice if it could, to ease the burden on the user as
explained above. I think it can if the queries are not too fine-grained. The
interface in the patch is pretty much the coarsest grain that I could think
of. It would be bad, for example, to add interfaces like "does this graph
node have neighbors of degree > N?" because not every allocator could
answer it.

The coalesceThisCopy interface is there for similar reasons. Some coalescing
algorithms (Briggs' conservative coalescing is a good example) examine the
interference graph and apply heuristics based on the degree of the neighbors
of nodes being coalesced. Your utility class could provide interfaces to make
more fine-grained queries, but as noted above, it seems a bit harder as
various register allocators may use very difference representations. So
coalesceThisCopy is certainly a compromise. It does move some of the
decision-making for coalescing into the register allocator. You raise a good
point and I'll think about how we can improve this.

Do you have suggestions for how to move forward based on the above? I have
some critical work to do here and it's going to require that we get pluggable
]coalescers working at some level. We can of course evolve the interface as
we learn.

                                                 -Dave

Yeah, I wondered about that too. I haven't ever seen that in any of the
commits I've done in the past. I agree it's harmless, but curious.

                                              -Dave

I actually managed to factor out the code for merging live intervals and
remove copy instructions. This code updates a bunch of dataflow information
so it's important that other register coalescers can use it.

During the course of this work, I noticed something odd. In the current
coalescer, we have this sequence:

00298 if (JoinIntervals(DstInt, SrcInt)) {
[...]
00322 } else {
[...]
00332 }

Ok, at this point intervals were joined and in the process, DstInt and SrcInt
might have been swapped by LiveInterval::join.

00334 bool Swapped = repSrcReg == DstInt.reg;
00335 if (Swapped)
00336 std::swap(repSrcReg, repDstReg);
00337 assert(MRegisterInfo::isVirtualRegister(repSrcReg) &&
00338 "LiveInterval::join didn't work right!");

Ok, this code says that if the intervals were swapped, swap the register
numbers we previously extracted from the live intervals before joining.
This makes things consistent so that repSrcReg is the register corresponding
to the new SrcInt and ditto with repDstReg.

[...]
00369 // If the intervals were swapped by Join, swap them back so that the
register
00370 // mapping (in the r2i map) is correct.
00371 if (Swapped) SrcInt.swap(DstInt);

Whoops! At this point repSrcReg is not consistent with SrcInt and the
same goes for repDstReg!

00372 li_->removeInterval(repSrcReg);
00373 r2rMap_[repSrcReg] = repDstReg;

Does this code get us into trouble due to the inconsistency created above?

Is this a bug? There's a lot of indirection going on here and it's hard to
keep track of it.

                                                        -Dave

Sorry I should have replied earlier. I really don't like this dual
interface approach. To me, this muddles things without offering any
real useful new functionalities.

Ok. See below for the rationale.

IMHO, if a register coalescer is tied to a particular allocator. Then
either it should simply belong to that allocator or that we have to
allow the allocator to act as a pass manager itself, i.e. it can
control what passes to run as part the allocating process. I don't
know if the pass manager allows that functionality. If not, then
that's something we should add because there are other passes that
can benefit from it.

I'm not sure I fully understand what you're proposing. The reason I
went with the AliasAnalysis-like construct is that I envisioned a
"-coalescer=<type>" user option, similar to how alias analysis and
register allocation works. Somehow we have to make that option
work with independent register coalescers (those derived from
MachineFunctionPass or equivalents that can run through the
PassManager) and those that are tied to some abstract register
allocation algorithm. A coalescer that runs in conjunction with an
allocator need not be tied to a particular implementation of register
allocation. For example, George and Appel's Iterated Register
Coalescing can work with any concrete implementation of graph
coloring (and there are lots of variations out there) and can probably
work with other algorithms as well. So if we can avoid it, I'd rather not
be limited to hard-coding these things.

Ok.

These two requirements led to the abstract RegisterCoalescer
interface in the patch. This is the interface that register allocators
know about. Likewise, coalescers need an abstract interface to
register allocators to ask questions and do other things.

If the two modules need to share information. I think you want a third module to encapsulate it.

If we decide that we don't need this flexibility, that's fine with me. I was
working based on the needs expressed by Tanya and Reid and the
implementation suggested by Christopher Lamb. I'm not wedded to it.

To me, the most important thing is a simple user interface. We want one
command-line option that picks a coalescer and the user should not have
to worry about whether the coalescer is implemented as a separate pass
or works as part of the register allocator. The user should not have to worry
about whether the coalescer and register allocator he or she specifies
are compatable in the sense that the register allocator knows the guts of
the coalescer and vice-versa. That seems to drive the need for abstract
interfaces for both.

I don't care for a MachineFunctionPass that can be directly called. I think it's a very good idea to keep the coalescers independent from the allocators. If that's desired, we should enhance passmanager so each allocator can run some sub-passes as part of the allocation pass. For now, I think we leave it as it is. If the user picked a coalescer that doesn't work with the register allocator of choice, we can output an error (or a warning and override the choice).

Also, on this particular file below. Seems to me these queries do not
belong to the register allocator. Perhaps a utility class that tracks
interference information that can be accessed by both the allocator
and the coalescer? Also, why is coalesceThisCopy() needed? Shouldn't
the coalescer be responsible for making the decision?

Register allocators know information about interference, for example,
the interference graph in a coloring allocator. The classic coalescing
algorithm examines the interference graph and looks for copies between
allocation units whose corresponding nodes do not have an edge between
them in the graph. That's why that interface is there.

Right, then perhaps the interference graph should not be part of allocator. Both the allocator and coalescer can be clients. It's weird to me for the coalescer to directly query the allocator especially since the goal is to keep the coalescer independent from the allocator.

A utility class is an interesting idea. In fact I've been thinking about how
one might go about separating the essentials of, say, a graph coloring
allocator and the particular heuristics it uses to make decisions. A graph
is a graph and could certainly be factored out into a separate utility class,
which could provide interfaces for queries. A policy-based design is one
possibility, where a generic graph coloring algorithm is written as a template
and the developer passes policy types that it uses to configure itself with
heuristics.

The larger question is whether a good abstract interface for the utility class
can be found that can support difference representations such as graphs,
the linear-scan stuff, etc. It seems hard to me. I'm not even sure a
coalescer that wants to work with an allocator can work with every register
allocation algorithm. For example, can the linear-scan algorithm satisfy
the queries asked by a coalescer that expects a graph coloring algorithm?
I don't know. It would be nice if it could, to ease the burden on the user as
explained above. I think it can if the queries are not too fine-grained. The
interface in the patch is pretty much the coarsest grain that I could think
of. It would be bad, for example, to add interfaces like "does this graph
node have neighbors of degree > N?" because not every allocator could
answer it.

I don't think we need that level of abstraction right now. :slight_smile:

The coalesceThisCopy interface is there for similar reasons. Some coalescing
algorithms (Briggs' conservative coalescing is a good example) examine the
interference graph and apply heuristics based on the degree of the neighbors
of nodes being coalesced. Your utility class could provide interfaces to make
more fine-grained queries, but as noted above, it seems a bit harder as
various register allocators may use very difference representations. So
coalesceThisCopy is certainly a compromise. It does move some of the
decision-making for coalescing into the register allocator. You raise a good
point and I'll think about how we can improve this.

Ok.

Do you have suggestions for how to move forward based on the above? I have
some critical work to do here and it's going to require that we get pluggable
]coalescers working at some level. We can of course evolve the interface as
we learn.

Assuming the interference graph is available, does the coalescer need any information from the allocator to make decisions? Are you designing the coalescer so it always run before allocation? If so, please keep it simple. :slight_smile: Make it operate on existing live interval information (perhaps enhanced if needed) and the new interference graph class (again, if needed). I think as a first step, the coalescer should be as aggressive as possible (i.e. no concerns for what the allocator would do).

Evan

> These two requirements led to the abstract RegisterCoalescer
> interface in the patch. This is the interface that register
> allocators
> know about. Likewise, coalescers need an abstract interface to
> register allocators to ask questions and do other things.

If the two modules need to share information. I think you want a
third module to encapsulate it.

Yes, that makes sense to me.

I don't care for a MachineFunctionPass that can be directly called. I
think it's a very good idea to keep the coalescers independent from
the allocators. If that's desired, we should enhance passmanager so
each allocator can run some sub-passes as part of the allocation
pass.

I agree that this is a good long-term goal. Unfortunately, I don't have the
time resources to implement it right now. If someone else can take up
this as a sub-project, that would be great! I agree that several other passes
could make use of it.

For now, I think we leave it as it is. If the user picked a
coalescer that doesn't work with the register allocator of choice, we
can output an error (or a warning and override the choice).

Ok.

> Register allocators know information about interference, for example,
> the interference graph in a coloring allocator. The classic
> coalescing
> algorithm examines the interference graph and looks for copies between
> allocation units whose corresponding nodes do not have an edge between
> them in the graph. That's why that interface is there.

Right, then perhaps the interference graph should not be part of
allocator. Both the allocator and coalescer can be clients. It's
weird to me for the coalescer to directly query the allocator
especially since the goal is to keep the coalescer independent from
the allocator.

Good point.

> The larger question is whether a good abstract interface for the
> utility class can be found that can support difference representations
> such as graphs, the linear-scan stuff, etc.

I don't think we need that level of abstraction right now. :slight_smile:

True enough. But I need some kind of abstraction, because for various reasons
I don't have an interference graph implementation I can check in. But yet I
want to check something in that others can use.

Assuming the interference graph is available, does the coalescer need
any information from the allocator to make decisions?

Not that I'm aware of. And in the future if something more is needed, then
we can do another round of factoring.

Are you designing the coalescer so it always run before allocation? If so,
please keep it simple. :slight_smile:

Believe me, I'm trying to keep it as simple as possible. The coalescer I'm
working on doesn't run "before" allocation. It runs in conjunction with it.
Decisions have to be made here as to whether it gets pushed out into the
public repository and because of the way it's designed, the general llvm
community might not be interested in it anyway. It's more complex due to
some additional requirements on this end that don't have any relevance
to the llvm community.

Make it operate on existing live interval information (perhaps enhanced if
needed)

Yep, it does. Hence the factoring effort on the current coalescer.

and the new interference graph class (again, if needed).

As stated above, I'd like to make this an abstract interface to an
interference graph, due to complexity of my implementation that the llvm
community doesn't care about.

I think as a first step, the coalescer should be as aggressive as possible
(i.e. no concerns for what the allocator would do).

Well, that's the current coalescer, and the whole reason I'm writing a new one
is that we don't want it to be so aggressive.

In the end, what I am able to contribute back at this point is the
infrastructure to make it easy to implement new coalescers. Due to
legal and other issues, I can't contribute any concrete implementations
right now, and it's not likely I'll be able to do so any time soon. But I do
want to make sure that the infrastructure works for everyone.

This is very good feedback. I'll take it and rework the patch.

                                                    -Dave

I've got something in the works that provides an interface for the coalescer
to query an interference object. So the RegisterAllocator class no longer
exists and coalescers don't directly depend on register allocators. As a
bonus, the interference interface doesn't make any assumptions about
the underlying representation but has the interfaces necessary to make
equivalent queries to what a coalescer might ask an interference graph.

As a proof of concept, I have an example of what the linear scan algorithm
might provide. It's very inefficient and in practice linear scan probably
won't ever interact with a coalescer that partners with the allocator but
it serves to show that the interface can be met by something that doesn't
use a graph.

I've also implemented a graph-based realization using what I currently have,
but as I said in another message, I'm not at liberty to release that publicly.
It's pretty straightforward code, though.

The RegisterCoalescer interface still exists because register allocators need
to be able to invoke the coalescer, sometimes repeatedly. PassManager can't
do that. The current SimpleRegisterCoalescer (which should really be named
AggressiveRegisterCoalescer -- another patch on the way) inherits from both
MachineFunctionPass and RegisterCoalescer. The former because it is run and
managed by PassManager and the latter because it needs to belong to the
RegisterAllocator AnalysisGroup, similar to how alias analysis works. We need
a way for register allocators to ask PassManager for the coalescer object
(even if the coalescer doesn't inherit from MachineFunctionPass) and this is
the only way I know how to do it until someone takes on revamping how
PassManager works and providing the capability of constructing hierarchies
of PassManagers.

I'm still working out getting things to build, etc. and will post a patch when
I've got through all that. But does this sound like something closer to what
we want?

                                                -Dave

Hi David,

Just as an idea, you could try to use a coloring register allocator
submitted by Bill Wendling some time ago on the mailing list?
See this thread for more information
http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-April/008489.html

This allocator is already implemented using LLVM and could be probably
useful for testing your new infrastructure and refactored interfaces.
At least you would be able to do it for the linear scan and graph
coloring allocators, which represent a major part of potentially
possible allocators. And you don't need to contribute your own register
allocation code in this case.

-Roman

--- "David A. Greene" <greened@obbligato.org> schrieb: