Two Regalloc Enhancements

We have two features for register allocation we'd like to contribute if folks
think they are worthwhile. We want to get a read on whether they will be
useful to people.

The first features backschedules reloads during the spilling phase. As
reloads are generated, we have some very simple code to try to schedule them
as far ahead of the use as possible.

The second features modifies linearscan to try to spread register usage out a
bit. Rather than always grabbing the first free register in the allocatable
list, it remembers the last few registers recently assigned and does not reuse
them unless there are no other registers available. This tends to help the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there before
if the allocator has "just enough" registers. We haven't noticed any serious
performance problems in practice.

With both patches, we have seen performance improvements on some codes.

I know there's some work on post-ra scheduling going on which would probably
supercede the reload backscheduling code. If that's coming soon, there's
probably not much point in contributing it. The "round-robin" register
assignment would help and post-ra scheduler.

What's the community's opinion on whether these two features are worth
committing to the public repository?

                                   -Dave

We have two features for register allocation we'd like to contribute if folks
think they are worthwhile. We want to get a read on whether they will be
useful to people.

The first features backschedules reloads during the spilling phase. As
reloads are generated, we have some very simple code to try to schedule them
as far ahead of the use as possible.

Ok.

The second features modifies linearscan to try to spread register usage out a
bit. Rather than always grabbing the first free register in the allocatable
list, it remembers the last few registers recently assigned and does not reuse
them unless there are no other registers available. This tends to help the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there before
if the allocator has "just enough" registers. We haven't noticed any serious
performance problems in practice.

Ok. As with any heuristics change, some tests will benefit, some will suffer. I am ok with both sets of changes assuming there are ways to control them.

With both patches, we have seen performance improvements on some codes.

I know there's some work on post-ra scheduling going on which would probably
supercede the reload backscheduling code. If that's coming soon, there's
probably not much point in contributing it. The "round-robin" register
assignment would help and post-ra scheduler.

Post-ra scheduling has been working for a while. The reason it's not turned on for x86 is it's not helping much (1 or 2%) while the compile time cost is too high (~9% codegen time). I assume you guys are doing your experiments using AMD processors. It could be Intel's uArch is just not benefiting from the load scheduling.

Round-robin register assignment probably will help post-ra scheduling. However, for small functions it may end up increase the number of registers used. That can be bad for performance.

What's the community's opinion on whether these two features are worth
committing to the public repository?

I welcome the features as long as we can add them as llc-beta first. Once we have some more testing across all the platforms, we can then decide whether they can be turned on. Is that ok?

Evan

Ok. As with any heuristics change, some tests will benefit, some will
suffer. I am ok with both sets of changes assuming there are ways to
control them.

Yep, we have flags.

Post-ra scheduling has been working for a while. The reason it's not
turned on for x86 is it's not helping much (1 or 2%) while the compile
time cost is too high (~9% codegen time). I assume you guys are doing
your experiments using AMD processors. It could be Intel's uArch is
just not benefiting from the load scheduling.

Yes, I can imagine there would be differences here. The memory architectures
are quite different.

Round-robin register assignment probably will help post-ra scheduling.
However, for small functions it may end up increase the number of
registers used. That can be bad for performance.

Correct. As I said, we haven't noticed any degredation. But our code base is
very different from yours. :slight_smile:

> What's the community's opinion on whether these two features are worth
> committing to the public repository?

I welcome the features as long as we can add them as llc-beta first.
Once we have some more testing across all the platforms, we can then
decide whether they can be turned on. Is that ok?

Fine with me. How do I do this? Just put them under some flags that default
to false?

                                  -Dave

Ok. As with any heuristics change, some tests will benefit, some will
suffer. I am ok with both sets of changes assuming there are ways to
control them.

Yep, we have flags.

Post-ra scheduling has been working for a while. The reason it's not
turned on for x86 is it's not helping much (1 or 2%) while the compile
time cost is too high (~9% codegen time). I assume you guys are doing
your experiments using AMD processors. It could be Intel's uArch is
just not benefiting from the load scheduling.

Yes, I can imagine there would be differences here. The memory architectures
are quite different.

Round-robin register assignment probably will help post-ra scheduling.
However, for small functions it may end up increase the number of
registers used. That can be bad for performance.

Correct. As I said, we haven't noticed any degredation. But our code base is
very different from yours. :slight_smile:

What's the community's opinion on whether these two features are worth
committing to the public repository?

I welcome the features as long as we can add them as llc-beta first.
Once we have some more testing across all the platforms, we can then
decide whether they can be turned on. Is that ok?

Fine with me. How do I do this? Just put them under some flags that default
to false?

Yep! Thanks.

Evan

Hi David,

What effect is there on compile time?

David Greene wrote:

None. These are very simple changes. At most each
reload takes a wee bit longer because the code walks
backward through the BB to schedule.

                                      -Dave

These seem to be the exact opposite of what the PIC16 port requires.
However, since the behavior is controllable I don't think they will
break our port.

Just a little background, PIC16 only has one register; so preloading it
is just going to cause too many useless spills...

Regards,
Ali

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]

On

Behalf Of David Greene
Sent: Thursday, July 23, 2009 12:42 PM
To: llvmdev@cs.uiuc.edu
Subject: [LLVMdev] Two Regalloc Enhancements

We have two features for register allocation we'd like to contribute

if

folks
think they are worthwhile. We want to get a read on whether they will

be

useful to people.

The first features backschedules reloads during the spilling phase.

As

reloads are generated, we have some very simple code to try to

schedule

them
as far ahead of the use as possible.

The second features modifies linearscan to try to spread register

usage

out a
bit. Rather than always grabbing the first free register in the
allocatable
list, it remembers the last few registers recently assigned and does

not

reuse
them unless there are no other registers available. This tends to

help

the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there
before
if the allocator has "just enough" registers. We haven't noticed any
serious
performance problems in practice.

With both patches, we have seen performance improvements on some

codes.

I know there's some work on post-ra scheduling going on which would
probably
supercede the reload backscheduling code. If that's coming soon,

there's

probably not much point in contributing it. The "round-robin"

register