Hi Leslie,

I suggest adding these 3 papers to your reading list.

Register allocation for programs in SSA-form

Sebastian Hack, Daniel Grund, and Gerhard Goos

Chair for Programming Languages and Compiler Construction

Simple and Efficient Construction of Static Single Assignment Form

Matthias Braun , Sebastian Buchwald , Sebastian Hack , Roland Leißa , Christoph Mallon , and Andreas Zwinkau

https://www.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf

Optimal register allocation for SSA-form programs in polynomial time

Sebastian Hack, Gerhard Goos

http://web.cs.ucla.edu/~palsberg/course/cs232/papers/HackGoos-ipl06.pdf

A lot of the earlier literature regarding the register allocation problem describes the general graph colouring problem as NP-complete, however previous research in Register Allocation has been in the context of programs that were not in SSA form. i.e. the Chaitin-Briggs paper states that register allocation is NP-complete and proposes an iterative algorithm.

If one reads Hack Goos, there is the discovery that programs that are in SSA form have chordal graphs, and the colouring algorithm for chordal graphs can be completed in polynomial time. After the cost of building the interference graph, it is possible to perform register allocation in a single pass. The key is in not modifying the graph.

If one has frequency for each basic block, then one can sort basic blocks by frequency, allocating the highest frequency blocks first, and where possible assigning the same physcial register to all virtual registers in each phi node (to avoid copies). At program points where the live set is larger than k, the set of physical registers, one spills the the register that has the largest distance between its next use. At least that is how I am thinking about this problem. I am also a novice with regards to register allocation.

I intend to develop a register allocator for use in this JIT engine:

rv8: a high performance RISC-V to x86 binary translator

Michael Clark, Bruce Hoult

https://carrv.github.io/papers/clark-rv8-carrv2017.pdf

Our JIT already performs almost twice as fast a QEMU and we are using a static register allocation, and QEMU i believe has a register allocator. We are mapping a 31 register RISC architecture to a 16 register CISC architecture. I have statistics on the current slow-down, and it appears to mainly be L1 accesses to spilled registers. I believe after we have implemented a register allocator in our JIT we will be able to run RISC-V binaries at near native performance on x86-64. In fact given we are performing runtime profile guided optimisation, it may even be possible for some benchmarks to run faster than register allocations that are based on static estimates of loop frequencies.

https://anarch128.org/~mclark/rv8-slides.pdf

https://rv8.io/bench

We’ve still got a long way to go to get to ~1:1 performance for RISC-V on x86-64, but I think it is possible, at least for some codes…

Regards,

Michael.