Yep, this is the one I was thinking of. It is available online here:
I was just looking at this today. One thing that strikes me about
all these papers I've read on the topic is that no one seems to
consider the interaction of coalescing with spilling. By definition
coalescing increases lifetimes and can cause more interferences.
Yet the papers all try to coalesce as many copies as possible.
Funny you should mention it, we've been (okay, Evan has been fighting this issue lately.
On, say, the x86 machines, the cost of a spill is really not much
different from the cost of a copy so it's probably close to a wash in
the end. But there are many machines where the cost of each operation
is vastly different. Aggressive coalescing is not always good. Often
you'd rather copy than spill.
Even on X86, load are more expensive than copies (At least on sane implementations).
Is anyone aware of publications addressing the interplay among
coalescing, live range splitting, register allocation and spilling?
This is one of the reasons I want to separate all of these concerns.
We shouldn't force developers to always coalesce or always use some
generic measure of spill cost.
In my mind (a dangerous place to go), it's not about coallescing vs spilling, it's about splitting vs spilling.
In particular, coallescing does increase live ranges of values, but you can have the same increase from code written without copies. Basically, not doing aggressive copy coallescing means that you are dependent on the user telling you where live ranges should be split. In an aggressive compiler, the user doesn't actually have control, as most of the explicit copies are gone by the time regalloc happens. This means that you have an arbitrary set of copies that come from things like 2-addr elim, phi elim, extensions that don't generate code, etc.
Instead of doing this. I think the right design is to:
1. Coallesce vregs together as aggressively as possible. Build huge live ranges, delete as many copies as possible.
2. Start priority based register allocation.
3. For each live range without a register available, consider:
1. live range splitting
In particular, by doing live range splitting, the *register allocator* can control where the copies go, rather than relying on luck that previous passes introduce copies in convenient places.
In practice, right now we have no splitting capability. In addition, we coallesce physregs with vregs, which can end up pinning a physreg across an entire function even if it has few uses. This is all very bad, Evan is working on making the coallescer more careful about coallescing vregs with pregs now.
In the next 6 months or so, I'd like to start investigating a bigger refactoring of linear scan, which will hopefully allow us to build in some of the smarter techniques like aggressive remat, splitting, etc. The first step to getting that is to make sure the datastructures (e.g. live intervals) are sane.