Spillers

Can someone explain the theory behind the spillers in VirtRegMap.cpp?

It seems as though the spillers do triple duty:

- Insert load/store operations and/or fold instructions as necessary to carry
  out spills

- Rewrite the spilled virtual registers to use machine registers (mapping
  given by the caller in the VRM).

- Rewrite machine code to change virtual registers to physical registers
  (mapping given by the caller in the VRM, presumably the caller is regalloc).

My question concerns duty #2. There is code in the spiller that assumes
intervals created to spill a virtual register have been mapped to physical
registers. Why is this so? This puts a constraint on register allocation
that is potentially bad.

For example, the prototype graph coloring allocator posted by Bill W. some
time ago has this code in it (the **** comments are my own):

/// SpillLiveInterval - Assign a live interval to a stack slot.
///
void RegAlloc::SpillLiveInterval(LiveInterval* LI) {
[...]
  int Slot = VRM->assignVirt2StackSlot(LI->reg);
  DEBUG(std::cerr << "Spilling " << *LI << " into slot " << Slot << "\n");
  std::vector<LiveInterval*> Added =
    LIs->addIntervalsForSpills(*LI, *VRM, Slot);

  static unsigned J = 0;

  for (unsigned I = 0; I < Added.size(); ++I, ++J) {
    unsigned VReg = Added[I]->reg;
    const TargetRegisterClass* RC = MF->getSSARegMap()->getRegClass(VReg);
    TargetRegisterClass::const_iterator Iter =
RC->allocation_order_begin(*MF);
    if (Iter + J >= RC->allocation_order_end(*MF)) J = 0;
    unsigned PReg = *(Iter + J);

    // **** Assign the newly-created live interval to a physical register
    VRM->assignVirt2Phys(VReg, PReg);
    PRT->addRegUse(PReg);

    DEBUG(std::cerr << "\tAdded LI: " << *Added[I]
                    << " assigned Reg " << MRI->getName(PReg)
                    << "\n");

    for (LiveIntervals::const_iterator
           K = LIs->begin(); K != LIs->end(); ++K) {
      const LiveInterval& L = K->second;
      unsigned LIReg = L.reg;

      if (!MRegisterInfo::isVirtualRegister(LIReg) || LIReg == VReg) continue;

      // **** Check to see if the physreg we assigned conflicts with something
      // **** else. If so, that thing has to live in memory too.
      if (VRM->hasPhys(LIReg) && VRM->getPhys(LIReg) == PReg &&
          !VRM->hasStackSlot(LIReg) && Added[I]->overlaps(L))
        VRM->assignVirt2StackSlot(LIReg);
    }
  }

The two points I want to talk about are noted by the **** comments. The first
takes a newly-created live interval to represent the lifetime between a reload
and use of the reload (or def and store) and assigns it a physical register
*outside of the coloring algorithm proper.* This is due to the requirement
that the spiller see spill intervals mapped to physical registers.

The second is a consequence of the first. Because of this hackish "coloring"
of the spill interval, we may have to patch up code and move other things
into memory as well. This is bad.

Traditionally, a graph coloring algorithm spills some number of candidates and
then restarts the algorithm with the graph representing the newly-inserted
live intervals for the spills. Those intervals get colored along with
everything else and therefore the allocator can make a better decision based
on the global information available.

It seems as though the current spillers assume they are the last thing to run.
That's not true in the general case as allocators often want to iterate to get
the best mapping possible. In fact the priority-based coloring algorithm Bill
posted avoids this iteration precisely because it can't easily happen (as I'm
discovering as I implement my own algorithm). This leads to poorer register
allocation than is necessary.

Right now, I'm simply trying to understand the logic of the spillers and why
they expect spill intervals to be pre-colored. Can someone help?

Thanks.

                                                  -Dave

Well, the spiller in VirtRegMap seems to expect that every possible virtual register, including those created due to spilling, are mapped to a physical register. It is not that some are "precolored" and others are not. The spiller will be run after the register allocator has finished his job, and so, each virtual will have a physical location. Maybe Evan could correct me if it is not like that.
     In my understanding, your graph coloring algorithm would be iterative, as normally happens, and, once it finishes the last iteration, each virtual would be mapped to a physical register. At this moment the spiller would be called. Further spills should not happen during this phase.

best,

Fernando

Hi, David.

Spill intervals must be precolored because they can’t be spilled once more. They are the shortest intervals precisely over each def/use of the original interval. That is why they also have their weights set to #INF.

Imagine that on a second iteration allocation algorithm figures out that some spilled interval can’t be assigned a physical register. Allocator can’t spill it so some recoloring must be done to free at least one physical register to allocate. Precoloring #INF weighted intervals makes allocation simpler by eliminating such situations.

As for giving the best result. If assumed that each interval is spilled into the shortest spill intervals then precoloring won’t do any harm to the quality of allocation (as shown above). But in theory you can spill intervals differently. For example, interval can be split into two intervals - the shortest interval that cannot be colored right now and the rest part that can be colored. That say if you have [10, 50) intervals that conflicts with [40, 45) and can’t be colored it can be split into smth like [10, 40) and [40, 50). The former part should not be precolored as it has less conflicts (it doesn’t intersects with [40, 45) ) and can be colored entirely by one register on the next iteration.

Unfortunately, it seems that current implementation doesn’t support such “clever” spilling.

Anyway, this precoloring doesn’t force allocation algorithms to be non-iterative. In my implementation of optimistic register coloring I precolor spill intervals at the beginning of each iteration and thus don’t have to spill their neghbours specifically. It was done automatically by Select phase of algorithm.

As Fernando has mentioned while I was writing this after the last iteration of your algorithm before you call Spiller::runOnMachineFunction method every interval in VirtRegMap must be mapped to a physical register, both spill and others.

Hope that helps.

Anton.

Spill intervals must be precolored because they can't be spilled once more.
They are the shortest intervals precisely over each def/use of the original
interval. That is why they also have their weights set to #INF.

Yes, that's true. But I wonder if we shouldn't be smarter about which
register we pick to color it. In Bill W's implementation, it was essentially
random. What was your solution to this?

As for giving the best result. If assumed that each interval is spilled
into the shortest spill intervals then precoloring won't do any harm to the
quality of allocation (as shown above).

Wait, but aren't you implying then that it's impossible for an interval
corresponding to a spill to be uncolorable? I don't think that's true.
Therefore, precoloring most certainly can cause harm because a decision
is being made in local context without the global information that could
help make a better one.

But in theory you can spill intervals differently. For example, interval can
be split into two intervals - the shortest interval that cannot be colored
right now and the rest part that can be colored. That say if you have [10,
50) intervals that conflicts with [40, 45) and can't be colored it can be
split into smth like [10, 40) and [40, 50). The former part should not be
precolored as it has less conflicts (it doesn't intersects with [40, 45) )
and can be colored entirely by one register on the next iteration.

Right. Live range splitting would indeed be nice to have. Bill's
implementation has a flavor of it.

Unfortunately, it seems that current implementation doesn't support such
"clever" spilling.

True. People have talked about adding it ever since I started getting
involved.

Anyway, this precoloring doesn't force allocation algorithms to be
non-iterative. In my implementation of optimistic register coloring I
precolor spill intervals at the beginning of each iteration and thus don't
have to spill their neghbours specifically. It was done automatically by
Select phase of algorithm.

So how did you get around the requirement of the spillers that intervals
be mapped to physical registers? Or did you not use the provided spillers?
That may be what I end up doing. Loads and stores don't absolutely have to be
inserted into the actual instruction text on each iteration, though it is nice
for debugging purposes.

As Fernando has mentioned while I was writing this after the last iteration
of your algorithm before you call Spiller::runOnMachineFunction method
every interval in VirtRegMap must be mapped to a physical register, both
spill and others.

Right, which means the pre-defined spillers can't be called during iteration.

                                     -Dave

To be clear, the spiller interface does support (or could with very straight-forward extension) splitting, it just requires you to give the split ranges new intervals and a new vreg.

-Chris

Right. Splitting should happen outside of the spiller, in the allocator
proper. The spiller should not care how intervals got there. Except for the
ones that perform spills and reloads, of course. :slight_smile:

                                                 -Dave

Hi,

Sorry for the delay. I was trying to wrap my head around some live interval analysis code and have forgotten about emails. :slight_smile:

Can someone explain the theory behind the spillers in VirtRegMap.cpp?

It seems as though the spillers do triple duty:

- Insert load/store operations and/or fold instructions as necessary to carry
  out spills

Actually folding decision was made during allocation. When the allocator decides to spill, it asks live interval analysis to attempt to fold the load / store. The spiller doesn't actually perform the "folding". It needs to know that they have happened to keep track of reuse information.

- Rewrite the spilled virtual registers to use machine registers (mapping
  given by the caller in the VRM).

- Rewrite machine code to change virtual registers to physical registers
  (mapping given by the caller in the VRM, presumably the caller is regalloc).

Right.

My question concerns duty #2. There is code in the spiller that assumes
intervals created to spill a virtual register have been mapped to physical
registers. Why is this so? This puts a constraint on register allocation
that is potentially bad.

Fernando and Anton have already answered this question. The allocator creates new intervals for spill / restore and they are just like other intervals (except they are not allowed to spill again).

So to sum up. The code in VirtMap is not really performing the spilling. It's mostly just doing the rewrite.

Evan

Spill intervals must be precolored because they can’t be spilled once more.
They are the shortest intervals precisely over each def/use of the original
interval. That is why they also have their weights set to #INF.

Yes, that’s true. But I wonder if we shouldn’t be smarter about which
register we pick to color it. In Bill W’s implementation, it was essentially
random. What was your solution to this?

I allocated spill intervals at the beginning of each iteration so all the rest intervals (except of physreg intervals) were uncolored at the moment. So the only difference in allocating regs for spills was what end to choose: allocation_order_begin() or allocation_order_end(), I chose the former. I understand that sometimes you can prefer one register over another for spill so it won’t conflict with as much intervals as it would being mapped to another phys, but the same can be said about every interval, right?

As for giving the best result. If assumed that each interval is spilled
into the shortest spill intervals then precoloring won’t do any harm to the
quality of allocation (as shown above).

Wait, but aren’t you implying then that it’s impossible for an interval
corresponding to a spill to be uncolorable? I don’t think that’s true.
Therefore, precoloring most certainly can cause harm because a decision
is being made in local context without the global information that could
help make a better one.

At least this is the imply of LiveIntervals: try to call addIntervalsForSpills on a spill interval and you’ll have an assertion (that was true at least for 1.9, probably it’s true now). If spill interval has a length of 1 (and it has because of addIntervalsForSpills implementation) it can’t be spilled again physically. Moreover if you were able to spill such intervals your iterations could become an infinite loop: algorithm could spill these intervals over and over again. Having spill intervals mapped to physregs garantees that your iterations will finish eventually.

But in theory you can spill intervals differently. For example, interval can
be split into two intervals - the shortest interval that cannot be colored
right now and the rest part that can be colored. That say if you have [10,
50) intervals that conflicts with [40, 45) and can’t be colored it can be
split into smth like [10, 40) and [40, 50). The former part should not be
precolored as it has less conflicts (it doesn’t intersects with [40, 45) )
and can be colored entirely by one register on the next iteration.

Right. Live range splitting would indeed be nice to have. Bill’s
implementation has a flavor of it.

In case live range splitting implemented spill intervals wouldn’t have to be the shortest when some interval is spilled the first time. But during iterations uncolored spill intervals should become shorter and shorter until they are must-be-precolored.

Unfortunately, it seems that current implementation doesn’t support such
“clever” spilling.

True. People have talked about adding it ever since I started getting
involved.

Anyway, this precoloring doesn’t force allocation algorithms to be
non-iterative. In my implementation of optimistic register coloring I
precolor spill intervals at the beginning of each iteration and thus don’t
have to spill their neghbours specifically. It was done automatically by
Select phase of algorithm.

So how did you get around the requirement of the spillers that intervals
be mapped to physical registers? Or did you not use the provided spillers?
That may be what I end up doing. Loads and stores don’t absolutely have to be
inserted into the actual instruction text on each iteration, though it is nice
for debugging purposes.

As far as I understand, you don’t need to call spiller every iteration. LiveIntervals::addIntervalsForSpills does everything necessary for allocation to start next iteration. The idea is that when you spill some interval you only need to replace it with spill intervals for loads and stores and do the next iteration on new interval set but you don’t need to insert actual instructions in code until the very end of your algorithm.

As Fernando has mentioned while I was writing this after the last iteration
of your algorithm before you call Spiller::runOnMachineFunction method
every interval in VirtRegMap must be mapped to a physical register, both
spill and others.

Right, which means the pre-defined spillers can’t be called during iteration.

Exact.

Anton.

> Yes, that's true. But I wonder if we shouldn't be smarter about which
> register we pick to color it. In Bill W's implementation, it was
> essentially random. What was your solution to this?

I allocated spill intervals at the beginning of each iteration so all the
rest intervals (except of physreg intervals) were uncolored at the moment.
So the only difference in allocating regs for spills was what end to
choose: allocation_order_begin() or allocation_order_end(), I chose the
former. I understand that sometimes you can prefer one register over
another for spill so it won't conflict with as much intervals as it would
being mapped to another phys, but the same can be said about every
interval, right?

Ah, ok, that makes sense. Yes, this is true of every interval. It's the
classic, "do I reuse registers to limit spilling or chew up registers to
help scheduling" question.

FYI, in my implementation I just marked the intervals introduced by
spills as being special so that they would not be chosen to be spilled
again. Then they just get colored like every other interval.

> Wait, but aren't you implying then that it's impossible for an interval
> corresponding to a spill to be uncolorable? I don't think that's true.
> Therefore, precoloring most certainly can cause harm because a decision
> is being made in local context without the global information that could
> help make a better one.

At least this is the imply of LiveIntervals: try to call
addIntervalsForSpills on a spill interval and you'll have an assertion

Yep, and that's a good check.

algorithm could spill these intervals over and over again. Having spill
intervals mapped to physregs garantees that your iterations will finish
eventually.

As noted above, you don't actually have to color them early. You can just
mark them as unspillable. There will always be some other interval to
spill, unless there's a serious bug in the algorithm.

> Right. Live range splitting would indeed be nice to have. Bill's
> implementation has a flavor of it.

In case live range splitting implemented spill intervals wouldn't have to
be the shortest when some interval is spilled the first time. But during
iterations uncolored spill intervals should become shorter and shorter
until they are must-be-precolored.

Right, except for the pre-coloring bit. :slight_smile:

As far as I understand, you don't need to call spiller every iteration.

Right, that was my fundamental misunderstanding. I've got it straight
now.

Thanks to everyone for the help.

                                        -Dave

Ah, I didn't realize that. Good to know. Some more code for my
allocator. :slight_smile:

                                       -Dave

FYI, in my implementation I just marked the intervals introduced by
spills as being special so that they would not be chosen to be spilled
again. Then they just get colored like every other interval.

But they can be uncolored like every other interval then, right? When your algorithm finds out during one iteration that it should spill some interval it checks whether this interval is marked. If it is marked what’s happening? Choosing some neighbour interval to spill instead of that one? Is this better then precoloring spill intervals (starting from allocation_order_end phys reg)?

> FYI, in my implementation I just marked the intervals introduced by
> spills as being special so that they would not be chosen to be spilled
> again. Then they just get colored like every other interval.

But they can be uncolored like every other interval then, right? When your
algorithm finds out during one iteration that it should spill some interval
it checks whether this interval is marked. If it is marked what's
happening? Choosing some neighbour interval to spill instead of that one?

Exactly. There is always guaranteed to be at least one neighbor that can be
colored. Otherwise, the spiller didn't spill properly.

Is this better then precoloring spill intervals (starting from
allocation_order_end phys reg)?

Potentially, in at least a couple of ways.

These colors are picked outside the regular "select" phase and therefore don't
take advantage of any heuristics that might be there to do various things.
For example, select could have heuristics to guide allocation to keep maximum
flexibility for the scheduler. Adding similar heuristics to the pre-color
code would be needless duplication and a maintenance headache.

Precoloring necessarily restricts what the allocator can do with other
intervals. Some precoloring is necessary for things like ABI conformance
but generally, less is better.

In practice, do these things make a difference? I don't know. I haven't come
across any papers asking these questions. My intuition says it doesn't matter
for 99% of codes out there, but in our case, that 1% of codes can often mean
the difference between a sale and going home empty-handed.

The main reason I did what I did was to keep all of the select code in one
place. This is simply a matter of maintenance and keeping things as simple
and uniform as possible.

                                            -Dave