Register allocation in LLVM

Hello, all,

I want to implement the register allocation algorithm described in the
paper "Register Allocation via Coloring of Chordal Graphs, APLAS'05" in
LLVM. This is a graph coloring algorithm that can find an optimal coloring
of the interference graph in most of the cases. I've downloaded LLVM last
week, and started studying the code. Basically, I have to implement:

1) A new register allocation pass, similar to the class RA in
RegAllocLocal.cpp, for instance;

2) Replace the phi deconstruction algorithm, which I found in the class
PNE (PHIElimination.cpp); I would like to implement an algorithm that
uses XOR instructions instead of copy instructions to destroy phi
functions. It is the algorithm described in "Optimal register allocation
for SSA-form programs in polynomial time, Inf. Process. Lett, 98(4)", or
in "Register Allocation for Programs in SSA-Form, CC 2006". Basically,
this algorithm takes into consideration the results of the register
allocation phase, and adds one permutation of registers to each edge that
reaches the basic block where phi functions were destroyed. With three
xor, we can implement a swap without an extra register. For instance, if I
have (after RegAlloc), these PHI functions at the header of block B_d:
a(r1) <- (a1(r1), a2(r2))
b(r2) <- (b1(r2), b2(r1))

assuming that a2, b2 come from block B_s, at the end of block B_s I would
have to add the permutation: (r1, r2) <- perm (r2, r1)
which I can implement with six Xor instructions. Well, I would like to
get some comments about the best way to implement this in LLVM. Should I
change the function "copyRegToReg" in X86RegisterInfo.cpp? I am afraid
this function is used in other situations where copy instructions are
expected. On the other hand, if I create a separate function, and change
PHIElimination.cpp, I am afraid to lose retargability. Also, if
possible, I would like to know if, besides the source code of the
register allocation classes (which is very well commented, and very
clean), if there is any online description of the register allocation
interface. For instance, concerning register allocation:

- To send registers to memory:
RegInfo->storeRegToStackSlot(MBB, I, PhysReg, FrameIndex, RC);

- To bring registers from memory:
RegInfo->loadRegFromStackSlot(MBB, MI, PhysReg, FrameIndex, RC);

- Given instruction i, virtual v, and machine reg m, allocate m to v at i:
MI->SetMachineOperandReg(i, physReg);

And PHI deconstruction:

- to remove instructions from Basic Blocks:
    MachineInstr *MPhi = MBB.remove(MBB.begin());
    delete MPhi;

- to create new instructions:
    BuildMI, http://llvm.org/docs/CodeGenerator.html

- to discover the origin block of each operand in the phi function:
    MachineBasicBlock &opBlock =
*MPhi->getOperand(i).getMachineBasicBlock();

Thank you in advance,

Fernando

I want to implement the register allocation algorithm described in the
paper "Register Allocation via Coloring of Chordal Graphs, APLAS'05" in
LLVM. This is a graph coloring algorithm that can find an optimal coloring
of the interference graph in most of the cases. I've downloaded LLVM last
week, and started studying the code.

Cool, that looks like a nice algorithm!

Basically, I have to implement:

1) A new register allocation pass, similar to the class RA in
RegAllocLocal.cpp, for instance;

Yup.

2) Replace the phi deconstruction algorithm, which I found in the class
PNE (PHIElimination.cpp); I would like to implement an algorithm that
uses XOR instructions instead of copy instructions to destroy phi
functions. It is the algorithm described in "Optimal register allocation
for SSA-form programs in polynomial time, Inf. Process. Lett, 98(4)", or
in "Register Allocation for Programs in SSA-Form, CC 2006". Basically,
this algorithm takes into consideration the results of the register
allocation phase, and adds one permutation of registers to each edge that
reaches the basic block where phi functions were destroyed. With three
xor, we can implement a swap without an extra register. For instance, if I
have (after RegAlloc), these PHI functions at the header of block B_d:
a(r1) <- (a1(r1), a2(r2))
b(r2) <- (b1(r2), b2(r1))
assuming that a2, b2 come from block B_s, at the end of block B_s I would
have to add the permutation: (r1, r2) <- perm (r2, r1)
which I can implement with six Xor instructions.

Ok, seems complicated, but potentially cute. Note that most copy instructions inserted by the phi elimination phase are coallesced away: replacing them with xor instructions would prevent that.

Well, I would like to get some comments about the best way to implement this in LLVM. Should I change the function "copyRegToReg" in X86RegisterInfo.cpp?

No, certainly not.

I am afraid this function is used in other situations where copy instructions are expected.

Yup, that it would.

On the other hand, if I create a separate function, and change PHIElimination.cpp, I am afraid to lose retargability.

If you need to do this, I'd suggest adding a new virtual method "insertRegisterXor" or something, which works like copyRegToReg. It should be very straight-forward to implement for all targets. I would suggest starting out by implementing it on whatever target you are on, and make it abort on all others. When it comes time to make it work on other targets, the various target maintainers will help you out.

Also, if possible, I would like to know if, besides the source code of the register allocation classes (which is very well commented, and very clean), if there is any online description of the register allocation interface.

Unfortunately, not really. The high-level overview of the code generator is here, but it is lacking many details:
http://llvm.org/docs/CodeGenerator.html

Any contributions to make the documentation better are very welcome!

The best way to learn stuff is to look for examples in the existing passes and by asking questions here.

For instance, concerning register allocation:

- To send registers to memory:
RegInfo->storeRegToStackSlot(MBB, I, PhysReg, FrameIndex, RC);
- To bring registers from memory:
RegInfo->loadRegFromStackSlot(MBB, MI, PhysReg, FrameIndex, RC);
- Given instruction i, virtual v, and machine reg m, allocate m to v at i:
MI->SetMachineOperandReg(i, physReg);
And PHI deconstruction:

- to remove instructions from Basic Blocks:
   MachineInstr *MPhi = MBB.remove(MBB.begin());
   delete MPhi;

- to create new instructions:
   BuildMI, The LLVM Target-Independent Code Generator — LLVM 16.0.0git documentation

- to discover the origin block of each operand in the phi function:
   MachineBasicBlock &opBlock =
*MPhi->getOperand(i).getMachineBasicBlock();

Yup!

-Chris

I want to implement the register allocation algorithm described in the
paper "Register Allocation via Coloring of Chordal Graphs, APLAS'05" in
LLVM. This is a graph coloring algorithm that can find an optimal coloring
of the interference graph in most of the cases. I've downloaded LLVM last
week, and started studying the code.

Cool, that looks like a nice algorithm!

Basically, I have to implement:

1) A new register allocation pass, similar to the class RA in
RegAllocLocal.cpp, for instance;

Yup.

2) Replace the phi deconstruction algorithm, which I found in the class
PNE (PHIElimination.cpp); I would like to implement an algorithm that
uses XOR instructions instead of copy instructions to destroy phi
functions. It is the algorithm described in "Optimal register allocation
for SSA-form programs in polynomial time, Inf. Process. Lett, 98(4)", or
in "Register Allocation for Programs in SSA-Form, CC 2006". Basically,
this algorithm takes into consideration the results of the register
allocation phase, and adds one permutation of registers to each edge that
reaches the basic block where phi functions were destroyed. With three
xor, we can implement a swap without an extra register. For instance, if I
have (after RegAlloc), these PHI functions at the header of block B_d:
a(r1) <- (a1(r1), a2(r2))
b(r2) <- (b1(r2), b2(r1))
assuming that a2, b2 come from block B_s, at the end of block B_s I would
have to add the permutation: (r1, r2) <- perm (r2, r1)
which I can implement with six Xor instructions.

Ok, seems complicated, but potentially cute. Note that most copy instructions inserted by the phi elimination phase are coallesced away: replacing them with xor instructions would prevent that.

Also note that the 'xor trick' does not work if both registers happen to have the same contents (it zeros them). It would also
be fairly tedious to lower the 'xor trick' to a swap instruction for machines that have them (since x86 has one, that would be most machines).
Such an instruction can happen in rename, which makes it about the least resource intensive instruction besides nop.

It can, but in the case of x86, it is not implemented that way. The xchg instruction carries a lot of architectural baggage around with it, and on most implementations is at least twice as expensive as a reg to reg move.

Nate

Patrick Meredith wrote:

Also note that the 'xor trick' does not work if both registers happen to have the same contents (it zeros them).

Can you explain that with an example? I don't understand...

-Archie