We have two features for register allocation we'd like to contribute if folks
think they are worthwhile. We want to get a read on whether they will be
useful to people.
The first features backschedules reloads during the spilling phase. As
reloads are generated, we have some very simple code to try to schedule them
as far ahead of the use as possible.
The second features modifies linearscan to try to spread register usage out a
bit. Rather than always grabbing the first free register in the allocatable
list, it remembers the last few registers recently assigned and does not reuse
them unless there are no other registers available. This tends to help the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there before
if the allocator has "just enough" registers. We haven't noticed any serious
performance problems in practice.
With both patches, we have seen performance improvements on some codes.
I know there's some work on post-ra scheduling going on which would probably
supercede the reload backscheduling code. If that's coming soon, there's
probably not much point in contributing it. The "round-robin" register
assignment would help and post-ra scheduler.
What's the community's opinion on whether these two features are worth
committing to the public repository?
-Dave
We have two features for register allocation we'd like to contribute if folks
think they are worthwhile. We want to get a read on whether they will be
useful to people.
The first features backschedules reloads during the spilling phase. As
reloads are generated, we have some very simple code to try to schedule them
as far ahead of the use as possible.
Ok.
The second features modifies linearscan to try to spread register usage out a
bit. Rather than always grabbing the first free register in the allocatable
list, it remembers the last few registers recently assigned and does not reuse
them unless there are no other registers available. This tends to help the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there before
if the allocator has "just enough" registers. We haven't noticed any serious
performance problems in practice.
Ok. As with any heuristics change, some tests will benefit, some will suffer. I am ok with both sets of changes assuming there are ways to control them.
With both patches, we have seen performance improvements on some codes.
I know there's some work on post-ra scheduling going on which would probably
supercede the reload backscheduling code. If that's coming soon, there's
probably not much point in contributing it. The "round-robin" register
assignment would help and post-ra scheduler.
Post-ra scheduling has been working for a while. The reason it's not turned on for x86 is it's not helping much (1 or 2%) while the compile time cost is too high (~9% codegen time). I assume you guys are doing your experiments using AMD processors. It could be Intel's uArch is just not benefiting from the load scheduling.
Round-robin register assignment probably will help post-ra scheduling. However, for small functions it may end up increase the number of registers used. That can be bad for performance.
What's the community's opinion on whether these two features are worth
committing to the public repository?
I welcome the features as long as we can add them as llc-beta first. Once we have some more testing across all the platforms, we can then decide whether they can be turned on. Is that ok?
Evan
Ok. As with any heuristics change, some tests will benefit, some will
suffer. I am ok with both sets of changes assuming there are ways to
control them.
Yep, we have flags.
Post-ra scheduling has been working for a while. The reason it's not
turned on for x86 is it's not helping much (1 or 2%) while the compile
time cost is too high (~9% codegen time). I assume you guys are doing
your experiments using AMD processors. It could be Intel's uArch is
just not benefiting from the load scheduling.
Yes, I can imagine there would be differences here. The memory architectures
are quite different.
Round-robin register assignment probably will help post-ra scheduling.
However, for small functions it may end up increase the number of
registers used. That can be bad for performance.
Correct. As I said, we haven't noticed any degredation. But our code base is
very different from yours.
> What's the community's opinion on whether these two features are worth
> committing to the public repository?
I welcome the features as long as we can add them as llc-beta first.
Once we have some more testing across all the platforms, we can then
decide whether they can be turned on. Is that ok?
Fine with me. How do I do this? Just put them under some flags that default
to false?
-Dave
Hi David,
What effect is there on compile time?
David Greene wrote:
None. These are very simple changes. At most each
reload takes a wee bit longer because the code walks
backward through the BB to schedule.
-Dave
These seem to be the exact opposite of what the PIC16 port requires.
However, since the behavior is controllable I don't think they will
break our port.
Just a little background, PIC16 only has one register; so preloading it
is just going to cause too many useless spills...
Regards,
Ali
From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On
Behalf Of David Greene
Sent: Thursday, July 23, 2009 12:42 PM
To: llvmdev@cs.uiuc.edu
Subject: [LLVMdev] Two Regalloc Enhancements
We have two features for register allocation we'd like to contribute
if
folks
think they are worthwhile. We want to get a read on whether they will
be
useful to people.
The first features backschedules reloads during the spilling phase.
As
reloads are generated, we have some very simple code to try to
schedule
them
as far ahead of the use as possible.
The second features modifies linearscan to try to spread register
usage
out a
bit. Rather than always grabbing the first free register in the
allocatable
list, it remembers the last few registers recently assigned and does
not
reuse
them unless there are no other registers available. This tends to
help
the
backscheduling code by distributing register usage and providing more
scheduling freedom. It also can induce spilling where none was there
before
if the allocator has "just enough" registers. We haven't noticed any
serious
performance problems in practice.
With both patches, we have seen performance improvements on some
codes.
I know there's some work on post-ra scheduling going on which would
probably
supercede the reload backscheduling code. If that's coming soon,
there's
probably not much point in contributing it. The "round-robin"
register