Is it possible to use the SimpleRegisterCoalescing pass in an iterative way?

Hi,

I'm implementing some variations of graph-coloring register allocators for LLVM.
Many of them perform their phases (e.g. coalescing, graph
simplification, spilling, color selection) in an iterative way. Since
LLVM provides an implementation of the coalescing in the
SimpleRegisterCoalescing class already, I would like to reuse it (even
though I could of course create my own coalescing implementation). But
this class is a MachineFunctionPass.

I'm wondering, if it is possible to use this pass from another pass
(i.e. regalloc) in an iterative way:
I need to invoke it multiple times for the same MachineFunction at
certain places in my regalloc pass implementation.
May be some sort of analysis update can be performed? Or is it
designed to be used just once per MachineFunction?

Also overall, it would be interesting to know how LLVM supports such
kind of iterative algorithms? What are the ways to formulate the
independent phases of those algorithms as reusable passes that could
be used in an iterative way?

Thanks,
-Roman

SimpleRegisterCoalescer is a super class of RegisterCoalescer. A RegisterCoalescer can be run both as a pass or directly. Perhaps the answer is as simple as implementing coalesceFunction.

Evan

Hi Roman,

I did this a while back when implementing iterative register coalescing. My
guess is you're trying to do the same with the Generaslized Graph Coloring
algorithm,.

It's not easy to accomplish within the LLVM framework. I've made a number of
structural changes to our code here and it's in a bit of a state of bitrot due
to upstream changes we haven't merged in yet.

I got some of this work merged upstream in the RegisterCoalescer and
RegallocQuery interfaces. RegallocQuery is supposed to be an opaque
communication conduit between register allocators and coalescers such that
the allocator can update the coalescer when it makes changes and vice versa.

When I did the full implementation I found I had to make more changes to
RegallocQuery. I have not pushed those upstream for various reasons,
including that at the time the register coalescing code was in a great state
of flux and it was difficult to track. It will be a bit of work to get it all
up-to-date for merging.

Evan is right that coalesceFunction is the primary interface to invoke the
coalescer. The iterative coalescer I wrote implements that function. The
iterative coalescer is also a FunctionPass which does nothing. It's simply
there to fulfill the requirement that the register allocator depends on a
coalescer. This way I could use the SimpleRegisterCoalescer or the new
iterative coalescer in a pluggable manner. The generalized graph coloring
allocator holds a reference to the coalescer. SimpleRegisterCoalescing does
nothing in its implementation of coalesceFunction.

This is all very far from ideal. What we really need is a PassManager that
knows about multually dependent, iterative algorithms. I thought about this a
bit back in the day and came up with the idea of a hierarchy of PassManagers.
A set of iterative passes could live in a PassManager under the top-level
PassManager and the child PassManager could specify dependencies for all of
the passes it manages. I didn't get very far with this so I don't know it
it's a viable solution.

The iterative coalescer does its own correctness and profitability analysis
but reuses the code from SimpleRegisterCoalescing to do the actual LLVM IR and
dataflow updates. That's non-trivial work. The code to do this in
SimpleRegisterCoalescing is a bit...well...convoluted. :slight_smile: I did a
significant amount of work to separate that code out from the dataflow and
profitability analysis that determines when to coalesce copies.

This I think is very valuable work and I'd like to push it back. It's always
good to separate analyses and updates as much as possible so that things can
be reused.

But I can't promise that I'll be able to merge all of the work I've done. We
have quite a bit going on here right now and I've been moved to other bits of
work. I'll see if I can get some things accomplished out-of-band.

                                                       -Dave

Hi Dave,

Thanks a lot for such an elaborative answer!

Hi,

I'm implementing some variations of graph-coloring register allocators for
LLVM. Many of them perform their phases (e.g. coalescing, graph
simplification, spilling, color selection) in an iterative way. Since
LLVM provides an implementation of the coalescing in the
SimpleRegisterCoalescing class already, I would like to reuse it (even
though I could of course create my own coalescing implementation). But
this class is a MachineFunctionPass.

I'm wondering, if it is possible to use this pass from another pass
(i.e. regalloc) in an iterative way:
I need to invoke it multiple times for the same MachineFunction at
certain places in my regalloc pass implementation.
May be some sort of analysis update can be performed? Or is it
designed to be used just once per MachineFunction?

Hi Roman,

I did this a while back when implementing iterative register coalescing. My
guess is you're trying to do the same with the Generaslized Graph Coloring
algorithm,.

You are absolutely right.

It's not easy to accomplish within the LLVM framework. I've made a number of
structural changes to our code here and it's in a bit of a state of bitrot due
to upstream changes we haven't merged in yet.

I got some of this work merged upstream in the RegisterCoalescer and
RegallocQuery interfaces. RegallocQuery is supposed to be an opaque
communication conduit between register allocators and coalescers such that
the allocator can update the coalescer when it makes changes and vice versa.

Yes. Initially I didn't notice RegallocQuery in the header files. But
after Even pointed out that it can be used, I had a closer look at it.

When I did the full implementation I found I had to make more changes to
RegallocQuery.

Yes. This was exactly my conclusion. Namely, it would require quite
some work to implement a coalescer this way using this interface.
Too many pieces are missing yet.

I have not pushed those upstream for various reasons,
including that at the time the register coalescing code was in a great state
of flux and it was difficult to track. It will be a bit of work to get it all
up-to-date for merging.

I see.

Evan is right that coalesceFunction is the primary interface to invoke the
coalescer. The iterative coalescer I wrote implements that function. The
iterative coalescer is also a FunctionPass which does nothing. It's simply
there to fulfill the requirement that the register allocator depends on a
coalescer. This way I could use the SimpleRegisterCoalescer or the new
iterative coalescer in a pluggable manner.

Yes. Using coalescers in a pluggable manner would be the ideal solution.

The generalized graph coloring
allocator holds a reference to the coalescer. SimpleRegisterCoalescing does
nothing in its implementation of coalesceFunction.

The idea with the empty pass is really nice, but looks a bit like a hack :wink:

This is all very far from ideal.

:wink:

What we really need is a PassManager that
knows about multually dependent, iterative algorithms. I thought about this a
bit back in the day and came up with the idea of a hierarchy of PassManagers.
A set of iterative passes could live in a PassManager under the top-level
PassManager and the child PassManager could specify dependencies for all of
the passes it manages. I didn't get very far with this so I don't know it
it's a viable solution.

I was also thinking about something of this kind. But I was too lazy
to come up with a practical solution for it.

The iterative coalescer does its own correctness and profitability analysis
but reuses the code from SimpleRegisterCoalescing to do the actual LLVM IR and
dataflow updates. That's non-trivial work.

Oh, yes.

The code to do this in SimpleRegisterCoalescing is a bit...well...convoluted. :slight_smile:

I'd say that the whole SimpleRegisterCoalescing is a
bit...well...overcomplicated :slight_smile:

It does a very, very good job doing aggressive coalescing (and taking
into account subtle subregs related things and updating the
LiveIntervals information), but IMHO it became so complex that it is
almost unmaintainable. I've seen different coalescers for graph
coloring (Chitin, Chaitin-Briggs, iterated register coalescing,
chordal graph based ones) and linear scan (Wimmer-based) register
allocators. But SimpleRegisterCoalescing in LLVM (I particulary like
"Simple" in its name :wink: is by far the most complex one I've seen so
far. I think, only Evan can completely understand it by now. And
there are still subtle bugs in it being found every week or two and
fixed in the mainline. It is not a very good sign...

Not that I have a constructive solution for it, but may at some point
it should be split into independent parts that are more
understandable? Or may be it should be even completely redesigned to
become more understandable?

I did a significant amount of work to separate that code out from the dataflow and
profitability analysis that determines when to coalesce copies.

This I think is very valuable work and I'd like to push it back. It's always
good to separate analyses and updates as much as possible so that things can
be reused.

Absolutely!

But I can't promise that I'll be able to merge all of the work I've done. We
have quite a bit going on here right now and I've been moved to other bits of
work. I'll see if I can get some things accomplished out-of-band.

It would be very good, if you could push this code back to LLVM and
make it available for others.
I hope very much that you'll find some time for it!

On my side, after looking at alternatives, I decided to use my own
custom coalescers for now. They are small, clean and do what I need.
And I have even a limited reuse among the three GC-based register
allocators I'm working on (Chitin, Chitin-Briggs, iterated register
coalescing). It is not an ideal solution; but for now I'm more
concerned about the quality and speed of my register allocators. Once
I'm happy with that, I'll probably have a closer look and refactor the
code to make it more reusable as a part of a framework.

-Roman

Hi Dave,

Coming back to your mail about the RegisterCoalescer and RegallocQuery
interfaces, I'd like to ask a few questions.

Hi,

I'm implementing some variations of graph-coloring register allocators for
LLVM. Many of them perform their phases (e.g. coalescing, graph
simplification, spilling, color selection) in an iterative way. Since
LLVM provides an implementation of the coalescing in the
SimpleRegisterCoalescing class already, I would like to reuse it (even
though I could of course create my own coalescing implementation). But
this class is a MachineFunctionPass.

I'm wondering, if it is possible to use this pass from another pass
(i.e. regalloc) in an iterative way:
I need to invoke it multiple times for the same MachineFunction at
certain places in my regalloc pass implementation.
May be some sort of analysis update can be performed? Or is it
designed to be used just once per MachineFunction?

Hi Roman,

I did this a while back when implementing iterative register coalescing. My
guess is you're trying to do the same with the Generaslized Graph Coloring
algorithm,.

It's not easy to accomplish within the LLVM framework. I've made a number of
structural changes to our code here and it's in a bit of a state of bitrot due
to upstream changes we haven't merged in yet.

I got some of this work merged upstream in the RegisterCoalescer and
RegallocQuery interfaces. RegallocQuery is supposed to be an opaque
communication conduit between register allocators and coalescers such that
the allocator can update the coalescer when it makes changes and vice versa.

When I did the full implementation I found I had to make more changes to
RegallocQuery. I have not pushed those upstream for various reasons,
including that at the time the register coalescing code was in a great state
of flux and it was difficult to track. It will be a bit of work to get it all
up-to-date for merging.

Evan is right that coalesceFunction is the primary interface to invoke the
coalescer. The iterative coalescer I wrote implements that function. The
iterative coalescer is also a FunctionPass which does nothing. It's simply
there to fulfill the requirement that the register allocator depends on a
coalescer. This way I could use the SimpleRegisterCoalescer or the new
iterative coalescer in a pluggable manner. The generalized graph coloring
allocator holds a reference to the coalescer. SimpleRegisterCoalescing does
nothing in its implementation of coalesceFunction.

I understand the overall approach. I even tried to define my own
coalescer by deriving it from the RegisterCoalescer.
This was OK. But then I got stuck, because I don't understand how I
can select this coalescer in the command-line of llc.
As far as I can tell, there is no registry for the coalescers (similar
to what exists for register allocators and which allow you to select a
register allocator by providing the --regalloc=register_allocator_name
command-line option ) and therefore there is no way to select a
specific register coalescer among all defined coalescers. Is my
understanding correct? If so, do we need to add such a registry for
this purpose?
Dave, how have you solved this in your source tree?

The iterative coalescer does its own correctness and profitability analysis
but reuses the code from SimpleRegisterCoalescing to do the actual LLVM IR and
dataflow updates. That's non-trivial work.
The code to do this in SimpleRegisterCoalescing is a bit...well...convoluted. :slight_smile:

I learned it by hart recently. My initial implementation uses my own
custom coalescer. But the problem is that it misses many opportunities
compared to the "simple" register coalescer. In particular, some
cases, where live intervals do not interfere even though their
live-ranges overlap. And this may happen e.g. for some live intervals
that were produced from phi-nodes. There are also some tricky cases
related to commuting operands and so on. Doing all that correctly
would probably end up with a code that is as "simple" as the
SimpleRegisterCoalescer :wink:

I did a significant amount of work to separate that code out from the dataflow and
profitability analysis that determines when to coalesce copies.

This is also the direction that I'm thinking more and more about. I
think that if it would be possible to separate/extract an API that
would provide the possibility to check for possibility of coalescing
of two given intervals (a,b) and to perform the coalescing of two
particular intervals (a,b) then it would become much more reusable and
can serve as a basis for other coalescers. Those coalescers/register
allocators would then decide on their own about the candidates for
coalescing based on their own profitability analysis and internal
logic and then ask this common code if it is possible to coalesce
given intervals and to perform this coalescing and do dataflow
updates...

I think it is rather possible to extract it from the current
implementation, which currently covers too many things at once. For
example, it tries to do coalescing for all move-related intervals and
it does it based on its own metrics and its own logic. And in many
cases the code assumes that only these metrics are used. So the main
task is to separate the code that performs coalescing for a pair of
intervals from the code that decides which pairs and in which order
should be coalesced.

Is it more or less inline with your thinking? Is it what you managed to do?

This I think is very valuable work and I'd like to push it back. It's always
good to separate analyses and updates as much as possible so that things can
be reused.

But I can't promise that I'll be able to merge all of the work I've done. We
have quite a bit going on here right now and I've been moved to other bits of
work. I'll see if I can get some things accomplished out-of-band.

Dave, BTW, I'd be very interested in having a look at it even of it is
not to be pushed back to LLVM by you in the near future.
May be I can work on it to make it more prepared for the inclusion
into mainline. Or may be it will provide me with a better insight
about how the "ideal" coalescer interface should look like in LLVM.
So, if you could provide me with the snippets of your coalescer
related code, especially the dissected SimpleRegisterCoalescer, I'd be
very grateful to you.

-Roman