we here in the UCLA compiler's lab have an implementation of a very cool register allocator in LLVM. It finds an optimal register assignment for the x86 machine, via live range splitting, that is, if it is possible to have a register assignment without register spilling, our allocator can find it.
We have the short and long versions of our paper here:
Our algorithm is polynomial time, and up to now, it is adding about 15% more to the total compilation time of LLVM. It does not iterate like the current implementation of RegAllocLinearScan (LN), and does not have to worry about conflicts between virtuals and physical registers, two things that bring the running time of RegAllocLinearScan down. I am not an expert in C++, but I think that with a careful implementation, it would even be possible to match the compilation time of LN. Indeed, for the very large benchmarks, the algorithm sometimes is faster.
We have not implemented instruction folding yet. Nothing prevent us of implementing it, it is just a matter of time (and patience). Currently the code that our algorithm produces is faster than LN, if I use -disable-spill-fusing, and sometimes it is faster even when I let LN to use instruction folding.
Of course, because the algorithm is very new, there are a lot of things yet to improve, and we would like to hear comments from you guys. We did not do any global coalescing, and our spilling heuristics is the simplest possible: when having to decide among register to spill, evict those that have he longest interval. I had implemented a pass to build the dominator tree of MachineBasicBlocks, but my pass uses the naive quadratic algorithm, and I have this in my list of things to improve. Also, our algorithm only works for x86, because I hard coded the relations between aliased registers in the target independent part of the implementation. This is also something to improve. To implement the allocator, I had to change the implementation of LLVM code here and there: adding a function to insert swaps in MRegisterInfo, changing the coalescing routine in LiveIntervalAnalysis, etc.
Anyway, if you guys could take a look into our paper, I think you would like it. The idea is quite new, and if possible, we would like to produce a version that could be distributed with LLVM. But in this case, the work is huge, for the quality of my implementation is way below the quality of the implementation of the current linear scan algorithm.
All the best,