Some questions about live intervals

Hi,

I'm trying to sketch an LLVM-based implementation of the Extended
Linear Scan algorithm, described in this Vivek Sarkar's paper:

Sarkar reports that this version of Linear Scan produces better code
than graph-coloring regallocs and is also much faster (15x to 68x).

I already started work on the implementation of this algorithm and have
a few hopefully rather simple questions:

1) What is the easiest way to understand which MBB a given instruction
index belongs to? All the required information is available in the
MBB2IdxMap of the LiveIntervalAnalysis class. Would it be useful to add
a small function getMBBFromIndex() to this class?

2) Is there any simple way to iterate over all instructions, where a
given live interval is live? This functionality is required by many
register allocation algorithms. I think having some special iterator
functions for that in the LiveIntervalAnalysis class could be very
useful for this purpose. And implementation should be fairly
straightforward by using the information about the beginning and end
points of live ranges.

3) This question is related to the previous one. For computation of
spill costs by some register allocation algorithms, it is required to
iterate over all uses/defs of a given live interval between two points
of execution A and B. Does LLVM already maintain such a def/use list at
the MachineInstr level? Or should I really inspect each instruction?

4) What would be the easiest and the most efficient way to get the set
of live intervals that are live at a given point of execution P, i.e.
at instruction with number P? Of course, one could do one linear scan
of all intervals and remember for each point P the set of live
intervals. But this solution may require a lot of memory in case of
very big functions. May be there are some better/faster possibilities
available?

As for (1) and (2), I could implement and provide patches for it, if
you think this is desirable.

Thanks,
Roman

      Lesen Sie Ihre E-Mails jetzt einfach von unterwegs.
www.yahoo.de/go

Hi Roman,

I already started work on the implementation of this algorithm and have
a few hopefully rather simple questions:

1) What is the easiest way to understand which MBB a given instruction
index belongs to? All the required information is available in the
MBB2IdxMap of the LiveIntervalAnalysis class. Would it be useful to add
a small function getMBBFromIndex() to this class?

The MachineInstr class has a pointer to its MachineBasicBlock (at
least in top-of-tree). Call "getParent()" on the MachineInstr.

-bw

Hi Bill,
--- Bill Wendling <isanbard@gmail.com> schrieb:

> I already started work on the implementation of this algorithm and
have a few hopefully rather simple questions:
>
> 1) What is the easiest way to understand which MBB a given
> instruction index belongs to? All the required information is
> available in the MBB2IdxMap of the LiveIntervalAnalysis class.

Would > > it be useful to

add a small function getMBBFromIndex() to this class?
>

The MachineInstr class has a pointer to its MachineBasicBlock (at
least in top-of-tree). Call "getParent()" on the MachineInstr.

This is true. But sometimes, when you work with LiveIntervals, you use
start/end indexes coming from live ranges of an interval. And when you
call getInstructionFromIndex(start/end index), you sometimes get the
NULL pointer back, since instruction with this index was removed in the
meantime due to coalescing or something else. Then you still face a
problem of determining which BB this instruction index belongs to, but
you cannot use the trick proposed by you above. Does it explain more
clearly my problem and a reason for a proposed solution?

-Roman

      Lesen Sie Ihre E-Mails jetzt einfach von unterwegs.
www.yahoo.de/go

Roman,

I'm excited to hear that you are working on this algorithm. Do you plan to
contribute it to the public llvm repository?

                                                  -Dave

Hi David,

--- David Greene <dag@cray.com> schrieb:

> I already started work on the implementation of this algorithm and
have
> a few hopefully rather simple questions:

Roman,

I'm excited to hear that you are working on this algorithm. Do you
plan to contribute it to the public llvm repository?

Yes. I plan to contribute it to the public LLVM repository. The
skeleton implementation of this algorithm is alsmost done, though I
still have some small problems mentioned in my original email. But to
make the algorithm really usable, it should be extended to support
register classes, since Sarkar's algorithm cannot handle it. I'm going
to try out an approach based on Smith's paper about graph coloring
register allocation. Hopefully this would solve this issue.

- Roman

      Lesen Sie Ihre E-Mails jetzt einfach von unterwegs.
www.yahoo.de/go

Hi, Roman,

     we have an implementation of an allocator for LLVM that, like Sarkar's, allows to split live ranges at any program point. You can find its description here:

http://compilers.cs.ucla.edu/fernando/publications/drafts/long_PereiraPalsberg07.pdf

We handle register classes using an abstraction called puzzles. Also, we perform register allocation while the program is still in SSA-form. This requires breaking critical edges before RA, and removing phi-functions while taking register assignment in consideration, once RA is done. We have the code implemented in LLVM 2.1, and it passes all the tests that I have tried.
     If you need help testing and debugging your allocator, I can help you. We have a debugger that was very useful when trying to find errors in our implementation. Also, to implement Sarkar's ELS, you will need to swap live ranges at the boundaries of basic blocks. This is a little tricky once you have registers of different size, and I can help you with it.

all the best,

Fernando

Hi Fernando,

Hi, Roman,

     we have an implementation of an allocator for LLVM that, like
Sarkar's, allows to split live ranges at any program point. You can
find
its description here:

http://compilers.cs.ucla.edu/fernando/publications/drafts/long_PereiraPalsberg07.pdf

I've read your paper already. It is a very interesting approach!
But my impression was that it is rather complex solution and requires
quite some bits of implementation. I also looked at your sources. Looks
like they introduce a lot of new code into LLVM.

IMHO, one of the attractive parts of Sarkar's algorithm is its
simplicity. It almost does not require any significant changes in the
LLVM code-base and is also conceptually a bit simpler.

     we have an implementation of an allocator for LLVM that, like
Sarkar's, allows to split live ranges at any program point.

Actually, Sarkar's algorithm can split only at block boundaries, as far
as I understand. This can be an obstacle for very long basic blocks.
One of the interesting ideas could be to introduce live-range splitting
into his algorithm.

BTW, since I was also working on Wimmer's linear scan algorithm (which
is sort of implemented, works on simple test-cases and requires a lot
of code clean-up and refactoring), I also have the LiveInterval
splitting code in place and it can be done at any program point.

We have the code implemented in LLVM 2.1, and it passes all the tests
that I have tried.

Cool! I'm not that far yet :frowning:

     If you need help testing and debugging your allocator, I can
help you.

Thank you very much for offering this. I'll most likely come back to
you, once I'm that far.

We have a debugger that was very useful when trying to find errors in
our implementation.

I've seen references to it on your page. But this debugger is in Java,
or? So, one has to dump the results of register allocation and input
problem and it would do some checks, or?

Also, to implement Sarkar's ELS, you will need to
swap live ranges at the boundaries of basic blocks. This is a little
tricky once you have registers of different size, and I can help you

with it.

Actually, I already implemented something very similar for Wimmer's
linear scan. The main issue was to order the copies in the right way
and to introduce stores/loads to handle circular dependencies. Or do
you say that register classes of different sizes introduce additional
problems?

Thanks,
  Roman

      Lesen Sie Ihre E-Mails auf dem Handy.
www.yahoo.de/go

Actually, Sarkar's algorithm can split only at block boundaries, as far
as I understand. This can be an obstacle for very long basic blocks.
One of the interesting ideas could be to introduce live-range splitting
into his algorithm.

As far as I understand it, it can also split in the interior of basic blocks, because of pre-colored registers, otherwise it would not be optimal.

I've seen references to it on your page. But this debugger is in Java,
or? So, one has to dump the results of register allocation and input
problem and it would do some checks, or?

It is implemented in Java. You must dump some output and run the debugger on this output.

Actually, I already implemented something very similar for Wimmer's
linear scan. The main issue was to order the copies in the right way
and to introduce stores/loads to handle circular dependencies. Or do
you say that register classes of different sizes introduce additional
problems?

Well, it makes it a little bit more complicated, but only a little bit. A problem that I had was when the lower and higher bits of registers were swapped. For instance, when implementing the parallel copy:

BH <= AL || AX <= BX

you will need two swaps (or three copies), instead of only one swap if there were no aliasing.

best,

Fernando

> Actually, Sarkar's algorithm can split only at block boundaries, as
far as I understand. This can be an obstacle for very long basic
blocks.
> One of the interesting ideas could be to introduce live-range
splitting into his algorithm.

As far as I understand it, it can also split in the interior of basic

blocks, because of pre-colored registers, otherwise it would not be
optimal.

Well, I cannot find anything about in my copy of the paper. The paper
neither talk about splitting in the interior of basicc blocks, nor
about handling of pre-colored registers. And Sarkar also does not claim
the optimality of the algorithm. He claims that it is inherently more
scalable and efficient, i.e. linear as compared to O(n^2) for Graph
coloring.

But of course this is required and I'm going to extend the algorithm to
support it by doing splitting before pre-colored uses.

BTW, there are some other missing features in the Sarkar's algorithm.
For example, it spills only whole intervals, which is very similar to
typical GC regallocs, but not very efficient. I think, some of the
known methods for live-range splitting around high-pressure regions in
GC world, can be also applied here. This should improve the quality of
allocation.

Actually, I'm wondering if interval-splitting, handling of pre-colored
registers, handling of high-pressure regions would slow down the
algorithm to a very big extent or not?

> I've seen references to it on your page. But this debugger is in
Java, or? So, one has to dump the results of register allocation and
input problem and it would do some checks, or?

It is implemented in Java. You must dump some output and run the
debugger on this output.

OK. Then my understanding was correct.

> Actually, I already implemented something very similar for Wimmer's
> linear scan. The main issue was to order the copies in the right
way and to introduce stores/loads to handle circular dependencies. Or
do you say that register classes of different sizes introduce
additional problems?

Well, it makes it a little bit more complicated, but only a little
bit. A problem that I had was when the lower and higher bits of
registers were swapped.

Oh, now I see. My version of Wimmer's algorithm was written before LLVM
introduced support for sub-regs. Therefore I didn't have this problem
:wink: But now I have to cope with it. Anyway, I think it is a minor
extension. I guess, I should just take aliasing information into
account when ordering register moves.

-Roman

      Heute schon einen Blick in die Zukunft von E-Mails wagen? Versuchen Sie´s mit dem neuen Yahoo! Mail. www.yahoo.de/mail

Hi,

I'm trying to sketch an LLVM-based implementation of the Extended
Linear Scan algorithm, described in this Vivek Sarkar's paper:
http://www.cs.rice.edu/~vs3/PDF/cc2007.pdf
Sarkar reports that this version of Linear Scan produces better code
than graph-coloring regallocs and is also much faster (15x to 68x).

I already started work on the implementation of this algorithm and have
a few hopefully rather simple questions:

1) What is the easiest way to understand which MBB a given instruction
index belongs to? All the required information is available in the
MBB2IdxMap of the LiveIntervalAnalysis class. Would it be useful to add
a small function getMBBFromIndex() to this class?

Sure there is a reverse map (actually just a vector) Idx2MBBMap. Just add a query.

2) Is there any simple way to iterate over all instructions, where a
given live interval is live? This functionality is required by many
register allocation algorithms. I think having some special iterator
functions for that in the LiveIntervalAnalysis class could be very
useful for this purpose. And implementation should be fairly
straightforward by using the information about the beginning and end
points of live ranges.

Well yes, See addIntervalForSpills. Iterating through the liveranges, then for each liverange iterate through every instruction index. It's not very inefficient though.

3) This question is related to the previous one. For computation of
spill costs by some register allocation algorithms, it is required to
iterate over all uses/defs of a given live interval between two points
of execution A and B. Does LLVM already maintain such a def/use list at
the MachineInstr level? Or should I really inspect each instruction?

MachineRegisterInfo now contains the use / def information. The register allocator passes have not been changed to make use of them yet.

4) What would be the easiest and the most efficient way to get the set
of live intervals that are live at a given point of execution P, i.e.
at instruction with number P? Of course, one could do one linear scan
of all intervals and remember for each point P the set of live
intervals. But this solution may require a lot of memory in case of
very big functions. May be there are some better/faster possibilities
available?

There isn't an efficient way right now. I think you can keep some sort of interference graph to help with this? Perhaps you can use class RegallocQuery in RegisterCoalescer.h for this? David would know since he contributed this.

As for (1) and (2), I could implement and provide patches for it, if
you think this is desirable.

Yes, thanks.

Evan

RegallocQuery is just an abstract interface class. The intent is for it
to be the gateway through which queries to the register allocator are
sent. One such query is whether two intervals interfere. The "guts" of
the computation is up to some other piece of code, however. A naive
implementation might return intA.overlaps(intB), for the interference
query, for example.

I hadn't thought about a query to return the set of intervals live at a
particular point, but that's a good idea. I actually have a number of
changes to this interface queued up waiting for upstream submittal.
Some of the changes could be contraversial, though, so it might
take a bit of iteration to make it happen. It's also tied in with some
major coalescing refactoring that I did to allow more code shaing.
I also want to get that submitted upstream but it's a huge patch and
will take a while to get processed.

The bottom line is that RegallocQuery doesn't do any computation
on its own. It's only purpose is to allow the register allocator to be
decoupled from the rest of the compiler internals.

                                        -Dave

Hi Evan,

Here is a patch for the LiveIntervalAnalysis that we discussed.

--- Evan Cheng <evan.cheng@apple.com> schrieb:

> 1) What is the easiest way to understand which MBB a given
instruction index belongs to? All the required information is
available in the
> MBB2IdxMap of the LiveIntervalAnalysis class. Would it be useful
to add a small function getMBBFromIndex() to this class?

Sure there is a reverse map (actually just a vector) Idx2MBBMap.
Justadd a query.

Done. Please find a patch attached to this mail.

> As for (1) and (2), I could implement and provide patches for it,
> if you think this is desirable.

Yes, thanks.

Here you are!

-Roman

      Lesen Sie Ihre E-Mails auf dem Handy.
www.yahoo.de/go

LiveIntervalAnalysis.patch (2.32 KB)

Hi Fernando,

For some reason, I discovered your mail just today.

Fernando Magno Quintao Pereira wrote:

Hi, Roman,

Well, I cannot find anything about in my copy of the paper. The paper
neither talk about splitting in the interior of basicc blocks, nor
about handling of pre-colored registers. And Sarkar also does not claim
the optimality of the algorithm. He claims that it is inherently more
scalable and efficient, i.e. linear as compared to O(n^2) for Graph
coloring.

But of course this is required and I'm going to extend the algorithm to
support it by doing splitting before pre-colored uses.

I think in the description of the algorithm, in page seven, step 6, it has to insert register moves to fix the coloring for each program point P that is part of the program.

Well, as far as I understand this algorithm, it does not do any explicit interval splitting. But it tries to color each live range of the live interval separately and may assign different colors to each of those live ranges. After color assignment is done, the algorithm needs to insert some move intsructions, which is done by step 6. In any case, it does not really mean each program point, but only end-points of live range belonging to an live interval.

So, it could be seen as some sort of live interval spliiting, but a very limited one, since it only splits at the end of live ranges of live intervals.

And for sure, Sarkar's algorithm does not handle pre-colored regs out of the box.

BTW, there are some other missing features in the Sarkar's algorithm.
For example, it spills only whole intervals, which is very similar to
typical GC regallocs, but not very efficient. I think, some of the
known methods for live-range splitting around high-pressure regions in
GC world, can be also applied here. This should improve the quality of
allocation.

Actually, I'm wondering if interval-splitting, handling of pre-colored
registers, handling of high-pressure regions would slow down the
algorithm to a very big extent or not?

One thing that may happen is that, in order to reload spilled variables, you will need registers available.

As far as I understand, it is guaranteed by the algorithm, no special reservation in advance is neccessary.

So, either you spare two registers, what would be very restrictive in x86, or you do backtracking, as the current implementation of LLVM's linear scan does. Backtracking may slowdown the implementation quite a bit.

Agree. But it is not neccessary. The way how move instructions are inserted is very similar to Wimmer's linear scan algorithm. There is no need for any reservation of registers in advance or for any backtracking.

I believe that the other factors will not cause too much slowdown.

Well, from my experience with the Wimmer's linear scan, book-keeping during the linear scan allocation becomes more complex and time-consuming, once you introduce interval-splitting. Also checks for interception of two live intervals are executed more frequently and consume a significant part of the execution time. Another rather time-consuming operation was the insertion of register moves to glue the splitted parts together. But this is mainly due to the current LLVM design, since such an operation require for each MBB a set of live-in live intervals. And LLVM currently computes live-ins for each MBB just for physical registers, but not for virtual live intervals.

Best,
   Roman

Thanks. One question though. Should getMBBFromIndex() assert if given an index out of the range or simply returns a NULL pointer? I would think the later makes it a bit more friendly.

Evan

Hi Evan,

Thanks. One question though. Should getMBBFromIndex() assert if given
an index out of the range or simply returns a NULL pointer? I would
think the later makes it a bit more friendly.

Yes. It would be more friendly, probably. I can submit such a patch, if
you think it suits better.

On the other hand I want to mention the following pros:
- assert-based approach follows the style of all other getNNN
functions in LiveIntervalAnalysis.h. They all assert in out-of-range
cases.
- What does it mean, if you have a situation where you ask for the MBB
of an instruction index, which is out of range for any MBB? How can
this happen? If you know the index, then instruction should have been
already registered before, it's number is known and it is supposed to
belong to an MBB. If not, some internal mapping tables (e.g. start and
end of MBBs, index to MBB mappings, etc) in LiveIntervalAnalysis are
likely to be broken. In this case, it is better to assert() in those
cases, IMHO.

Please, let me know, if I should sumbit a patch without any assert()
and returning just a NULL pointer.

-Roman

Hi Evan,

Thanks. One question though. Should getMBBFromIndex() assert if given
an index out of the range or simply returns a NULL pointer? I would
think the later makes it a bit more friendly.

Yes. It would be more friendly, probably. I can submit such a patch, if
you think it suits better.

On the other hand I want to mention the following pros:
- assert-based approach follows the style of all other getNNN
functions in LiveIntervalAnalysis.h. They all assert in out-of-range
cases.
- What does it mean, if you have a situation where you ask for the MBB
of an instruction index, which is out of range for any MBB? How can
this happen? If you know the index, then instruction should have been
already registered before, it's number is known and it is supposed to
belong to an MBB. If not, some internal mapping tables (e.g. start and
end of MBBs, index to MBB mappings, etc) in LiveIntervalAnalysis are
likely to be broken. In this case, it is better to assert() in those
cases, IMHO.

Please, let me know, if I should sumbit a patch without any assert()
and returning just a NULL pointer.

Now I think either approach is fine. Please commit. Thanks!

Evan

Hi Evan,

[skipped] ...

> Please, let me know, if I should sumbit a patch without any
> assert() and returning just a NULL pointer.

Now I think either approach is fine. Please commit. Thanks!

Evan

I don't have a commit access to the SVN repository. Therefore I cannot
commit it myself. Should I better apply for getting a commiter access
or should I simply send you the final version of the patch?

-Roman

      Heute schon einen Blick in die Zukunft von E-Mails wagen? www.yahoo.de/mail

I'd be happy to commit it for you. But check with Chris about commit access.

Evan

My typical heuristic is to give out commit access on the second or third significant patch,

-Chris