request for help writing a register allocator

I'm using LLVM for a compiler course, and I'd like to have my students implement a graph-coloring register allocator. I'm having a lot of trouble figuring out how this should be done.

Is there anyone who has written an LLVM register allocator who would be willing to help me understand the basic ideas? I understand the algorithm, it's LLVM that I don't understand. For example:

- When allocating registers, how do I know which register class to use,
   and which registers of that class are available?

- How do I know which operands of a Machine Instruction are candidates
   for (simple) register allocation (e.g., are of type int)?

- How do I replace a virtual register with a physical one?

- Do I need to generate spill code to handle virtual registers that
   cannot be replaced with physical ones, and if so, how?

Susan Horwitz

Hi Susan,

You may find the PBQP allocator implementation useful as a reference to what's involved in adding a new allocator to LLVM. It's located in lib/CodeGen/RegAllocPBQP.cpp, with supporting files in the lib/CodeGen/PBQP directory.

I'm no expert on the LLVM register allocation interfaces, so I'll defer to those who are regarding the specifics of your questions.

-Jim

Hi Susan,

You may find the PBQP allocator implementation useful as a reference
to what’s involved in adding a new allocator to LLVM. It’s located in
lib/CodeGen/RegAllocPBQP.cpp, with supporting files in the lib/CodeGen/
PBQP directory.

Yes - as far as working allocators go PBQP is pretty simple. If you’re just interested in LLVM API details you can focus on the lower half of the RegAllocPBQP.cpp source file (everything from PBQPRegAlloc::findVRegIntervalsToAlloc()). The rest of the source (class declaration at the top of RegAllocPBQP.cpp aside) is mostly concerned with PBQP algorithm specifics, such as constructing cost matrices, or carrying out the PBQP graph reduction algorithm.

Is there anyone who has written an LLVM register allocator who would
be
willing to help me understand the basic ideas? I understand the
algorithm, it’s LLVM that I don’t understand. For example:

  • When allocating registers, how do I know which register class to
    use,
    and which registers of that class are available?

For each virtual register the MachineRegisterInfo::getRegClass() method will give you a TargetRegisterClass. You can use the allocation_order_begin/allocation_order_end methods to iterate over the allocable physregs in that class.

  • How do I know which operands of a Machine Instruction are candidates
    for (simple) register allocation (e.g., are of type int)?

Each operand of a MachineInstr has a corresponding MachineOperand object. You can query this object using the “isReg()” method to check if an operand is a register. You’d then have to use “TargetRegisterInfo::isVirtualRegister()” to test whether the operand is virtual, and from there perform your allocation.

I’ve attached a demo allocator that’ll get you started with the above APIs.

LLVM’s CodeGen library is target independent though, and the register allocator is expected to be able to allocate for all virtregs, regardless of their class (and to handle things like register aliasing). There’s no built-in distinction of “simple” candidates, as opposed to trickier ones.

If it helps you could hard code a test for your chosen platform (presumably something nice and clean) at the start of the allocator, and from there on work on the assumption that you’re dealing with that platform. Then it’d be up to you to decide which classes the students would have to allocate for. You’d still need a way to allocate any “hard” classes though if you want a working program.

  • How do I replace a virtual register with a physical one?
  • Do I need to generate spill code to handle virtual registers that

cannot be replaced with physical ones, and if so, how?

You could re-write it in place (using MachineRegisterOperand::setReg() if you want to manipulate the code yourself, or you can store the mapping in a VirtRegMap object and have the LLVM VirtRegRewriter apply the changes for you. The latter would have to be used in conjunction with the other LLVM regalloc components. You’d get liveness (LiveIntervals) and spilling (LiveIntervals/VirtRegRewriter) for free, at the cost of some extra work to understand their somewhat curious APIs. Looking at RegAllocPBQP.cpp is probably the best way to get your head around how those components are used by register allocation.

Regards,
Lang Hames.

RegAllocDemo.cpp (5.61 KB)

Å correction:

You could re-write it in place (using MachineRegisterOperand::setReg()…

That should have been MachineOperand::setReg()…

And a disclaimer:
I wrote the PBQP allocator, so I might be biased. I think it’s easier to follow than linear scan though.

  • Lang.

Other advice: if you’re looking to simplify this for students, I’d recommend staying away from X86 or ARM, which use subregs heavily. If you work with (e.g.) the sparc backend, you can avoid them completely, simplifying the problem.

-Chris

Each virtual register has an assigned register class. However, register classes relate to each other, and the machine IR also has subreg references. For example, this is how X86 handles AL/AX/EAX/RAX all aliasing each other. In the Sparc backend, the only aliases are in the FPU, and it doesn't use subregs to model them at this point.

-Chris

<ccing llvmdev>

So if AL is a sub-register of EAX (assume this is true even if not),
then will getAliasSet(AL) include EAX, and will getAliasSet(EAX)
include AL? If yes, then I think I’m OK.

Yes, I believe so.

Yep - that’s correct. Watch out for a potential gotcha though: getAliasSet(EAX) does not include EAX itself (and in general physregs do not appear in their alias sets).

Lang -

I've made some progress writing my register allocator, but now I'm stuck.

I have 2 questions for you:

1. I tried running the PBQP allocator (as a dynamic pass), but that didn't
    work. When I type this:

   llc -f -load Debug/lib/regalloc.so -regalloc=pbqp simple.bc

   I get the following error:

   llc: /afs/cs.wisc.edu/p/course/cs701-horwitz/public/llvm-root/llvm/include/llvm/Support/CommandLine.h:483: void llvm::cl::parser<DataType>::addLiteralOption(const char*, const DT&, const char*) [with DT = llvm::FunctionPass* (*)(), DataType = llvm::FunctionPass* (*)()]: Assertion `findOption(Name) == Values.size() && "Option already exists!"' failed.

   Can you tell from this what I'm doing wrong?

2. I tried writing a very simple register allocator. It works as follows:

Step 1: Find out which physical registers are already used. Do this by iterating over all instructions and testing each operand. Store results (including aliases) in a set.

Step 2: For each target reg class, save one (unused) physical register to be used for spills. (Some classes have no unused physical registers. This would be detected in Step 3 if there were a virtual register in that class, but it hasn't happened for my tests so far.)

Step 3: Iterate over all instructions again allocating virtual registers to available physical ones (in the appropriate class) or, if no such physical reg is available, allocate the virtual register both to the saved "spill" register for its class and to a stack slot. Keep track of the set of virtual registers already processed so none is processed twice. When a physical register is allocated, add it (and any aliases) to the set of used physical registers.

Step 4: Run the spiller.

This allocator works on very simple C code, but fails (with an uninformative segmentation fault) as soon as I include a call to scanf, or a loop, or an if-else.

Do you see any obvious errors in the approach outlined above?

If not, can you suggest a way to find out what is going wrong?

Thanks for any help you can give me!!

Susan Horwitz

Hi Susan,

  1. I tried running the PBQP allocator (as a dynamic pass), but that didn’t
    work…

Can you tell from this what I’m doing wrong?

The PBQP allocator is built into the LLVM CodeGen library, so the “-regalloc=pbqp” option is already available in llc. If you’ve built a copy of the PBQP allocator in a separate library it will try to re-register that same option, triggering the assertion you’re seeing. You can probably get around this by just changing the option in your copy to “pbqp2”.

  1. I tried writing a very simple register allocator. It works as follows…

This allocator works on very simple C code, but fails (with an uninformative segmentation fault) as soon as I include a call to scanf, or a loop, or an if-else.

There are any number of things that can go wrong in register allocation, so it’s hard for me to guess without seeing your code.

Possible issues:

  1. Some machine instructions have implicit register operands. If you are not including these in your calculations you will probably have some errors. My apologies for not mentioning this earlier.

You can find the set of implicit uses and defs by querying the TargetInstDesc object for each MachineInstr (call MachineInstr::getDesc() to get it, and then the TargetInstrDesc::getImplicitUses() and TargetInstrDesc::getImplicitDefs() methods to find the regs used/definied by this instr).

  1. How are you making sure that interfering virtregs never receive the same physreg? If you’re using the LiveIntervals analysis (and the LiveInterval::overlaps(LiveInterval &) test) you should be ok, since it gets a lot of testing (though bugs are not unheard of). If you’ve written your own liveness analysis that would be one of the first places I’d check for correctness.

Do you see any obvious errors in the approach outlined above?

No obvious errors. My concern would be the liveness issue mentioned above. In addition, what do you do if/when you encounter an instruction with two or more operands (in the same class) that have been spilled?

If not, can you suggest a way to find out what is going wrong?

The LLVM bugpoint tool is very handy if your allocator itself is crashing. It sounds like your allocator is running fine though, but miscompiling programs. Unfortunately in this case there aren’t many tools to help you (that I know of). I usually just work by inspection. Add some debugging output to your allocator, or fire llc up under gdb and dump information (many objects in LLVM have a handy “dump()” method to print out their state). Try to find a minimal failing test case (if it breaks on if tests then that’s a great start), and just look to see where the allocation goes wrong.

Thanks for any help you can give me!!

No problem at all. Writing allocators is non-trivial (took me quite a while to get my first one going). If you get really stuck try posting your allocator and a failing test case.

Good luck!

  • Lang.
  1. Some machine instructions have implicit register operands. If you are not including these in your calculations you will probably have some errors. My apologies for not mentioning this earlier.

You can find the set of implicit uses and defs by querying the TargetInstDesc object for each MachineInstr (call MachineInstr::getDesc() to get it, and then the TargetInstrDesc::getImplicitUses() and TargetInstrDesc::getImplicitDefs() methods to find the regs used/definied by this instr).

Oops. Seems we copy implicit operands like this into MachineOperands on the instruction before register allocation. Disregard the above advice - you do not need to check the TargetInstrDesc implicit operands. You only need to check the operands returned by MachineInstr::getOperand.

  • Lang.

There are any number of things that can go wrong in register allocation, so
it's hard for me to guess without seeing your code.

Possible issues:

2) How are you making sure that interfering virtregs never receive the same
physreg? If you're using the LiveIntervals analysis (and the
LiveInterval::overlaps(LiveInterval &) test) you should be ok, since it gets
a lot of testing (though bugs are not unheard of). If you've written your
own liveness analysis that would be one of the first places I'd check for
correctness.

I'm doing VERY simple reg allocation, just to see if I can get something to work. I only allocate a physical register to ONE virtual register (the first such virtual I find that hasn't already had a physical register allocated to it). So I don't think my problem has to do with interference.

In addition, what do you do if/when you encounter an instruction with two or
more operands (in the same class) that have been spilled?

For each instruction, I iterate over the operands. For each operand that is a virtual register "r", if "r" hasn't been given a physical register and there is one available (in "r"'s class), then I allocate that physical register to "r". If not, I call assignVirt2StackSlot(r) and assignVirt2Phys(r, sp), where sp is s physical register in "r"'s class that has been saved to be used as a spill register.

I don't do anything special for the case you mention, so that may be my problem. I thought that what I am doing, plus calling the spiller at the end, would magically take care of everything. If not, can you point me at the relevant documentation or an example from existing code?

I realized that I was unclear about where the actual problem is: My allocator runs fine, it is (as you suspected) the generated code that crashes. I can look at the generated (X86) code and I see the problem for one simple example: there's an instruction
   movl %edi, 4(%edi)
that should have been
   movl %edi, 4(%esp)
but I don't understand how the code gets generated, so this isn't very helpful. Is there a way for me to look at the machine instructions before register allocation?

Thanks again!

Susan

You can run llc with -debug which shows debug information from all
codegen phases.
You can also use -debug-only=<name> to debug only a specific codegen pass.

I see output like this, where I think that instead of LINEAR SCAN your
allocator will be run:

********** MACHINEINSTRS **********
entry:
20 MOV32mr <fi#1>, 1, %reg0, 0, %reg0, %EDI<kill>
28 %reg1026<def> = MOV32rm <fi#1>, 1, %reg0, 0, %reg0
36 MOV32mr <fi#0>, 1, %reg0, 0, %reg0, %reg1026<kill>
44 %EAX<def> = MOV32rm <fi#0>, 1, %reg0, 0, %reg0
64 RET %EAX<imp-usekill>
********** LINEAR SCAN **********
********** Function: foo
fixed intervals:
        %reg14,inf = [0,6:0)[6,14:1)[14,22:2) 0@?-(6) 1@? 2@? -> DI
        %reg2,inf = [46,54:1)[54,66:0) 0@54-(66) 1@? -> AL
        %reg23,inf = [0,22:0) 0@?-(22) -> EDI
        %reg1,inf = [46,54:1)[54,66:0) 0@54-(66) 1@? -> AH
        %reg15,inf = [0,6:0)[6,14:1)[14,22:2) 0@?-(6) 1@? 2@? -> DIL
        %reg3,inf = [46,54:1)[54,66:0) 0@54-(66) 1@? -> AX
        %reg19,inf = [46,66:0) 0@46-(66) -> EAX

*** CURRENT ***: %reg1026,inf = [30,38:0) 0@30-(38)
        processing active intervals:
        processing inactive intervals:
        allocating current interval: EAX
active intervals:
        %reg1026,inf = [30,38:0) 0@30-(38) -> EAX
inactive intervals:
        interval %reg1026,inf = [30,38:0) 0@30-(38) expired
********** REGISTER MAP **********
[reg1026 -> EAX]

**** Local spiller rewriting function 'foo':
**** Machine Instrs (NOTE! Does not include spills and reloads!) ****
# Machine code for foo():
  <fi#0>: size is 4 bytes, alignment is 4 bytes, at location [SP+8]
  <fi#1>: size is 4 bytes, alignment is 4 bytes, at location [SP+8]
Live Ins: EDI in VR#1025
Live Outs: EAX
....
Best regards
--Edwin

I found the problem! My generated code is spilling correctly but is not reloading at all. For example, if the original code has the equivalent of this (where %1024 is a virtual reg):

   %1024 = xxx
   ...
   yyy = %1024

and I find no physical register for %1024, then I assign it to physical register %edi and to a stackslot. That creates code like this:

   %edi = xxx
   store from %edi to the stack
   ...
   yyy = %edi

The last instruction is wrong. There needs to be a load from the stackslot into %edi before copying from there into yyy.

The documentation says:
   If the indirect strategy is used, after all the virtual registers
   have been mapped to physical registers or stack slots, it is
   necessary to use a spiller object to place load and store
   instructions in the code. Every virtual that has been mapped to a
   stack slot will be stored to memory after been defined and will be
   loaded before being used.
But this doesn't seem to be happening; the stores to memory are there but the loads are not.

Any ideas what's going wrong?

If not, any advice on how to generate the loads myself??

Susan

Hi Susan,

But this doesn’t seem to be happening; the stores to memory are there but the loads are not.

Any ideas what’s going wrong?

Are you using VirtRegMap::addSpillPoint and VirtRegMap::addRestorePoint ? If not you may need to add calls to them to let the rewriter know where to insert the loads/stores.

If not, any advice on how to generate the loads myself??

The TargetInstrInfo class has methods storeRegToStackSlot and loadRegFromStackSlot, however I’d avoid mixing direct calls to these with calls to the spiller.

Regarding Török’s suggestion - make sure you #define the DEBUG_TYPE macro (e.g. #define DEBUG_TYPE “foo”) if you want your debugging output to be available using the -debug-only=foo option.

If you’re running llc under a debugger you can also call MachineFunction::dump (actually many of the objects have a dump method) at any time to print out the state of your machine function. I’ve found this to be quite handy.

Cheers,
Lang.

I was NOT using addSpillPoint and addRestorePoint. I will add those, THANK YOU!

I've also found that RegAllocSimple.cpp is a good model for me if I decide to give up on using the spiller and instead add the spill/reload code myself.

So I'm currently feeling much more optimistic about this project.

Thanks for all the (very quick!) help!!

Susan

I tried both the most recent version of "simple" register allocation and the one from last August, and neither seems to work correctly (they run, but produce bad output). I used them to compile an old version of the Unix "replace" utility (source code attached). Here's how I created the executable:

   llvm-gcc -emit-llvm -O0 -c replace.c -o replace.bc
   opt -mem2reg replace.bc > replace.ssa
   mv replace.ssa replace.bc
   llc -f -regalloc=simple replace.bc
   gcc replace.s

Then I ran it like this (file "input" also attached):

   a.out This XXXX < input

I get a Segmentation fault with the current version of LLVM. With the old version it runs, but I get garbage output.

If I create the ".s" file using just
   llc -f replace.bc
everything is fine.

Does anyone know how to fix either version of RegAllocSimple?

Susan Horwitz

replace.c (9.5 KB)

input (16 Bytes)

I am not surprised. Since no one has been using it, it has bit rotted. Notice it has been removed from top of tree.

Why not use RegAllocLocal?

Evan

I'm having no luck getting my register allocator to work. I'm trying to do it using the "indirect" approach; i.e., using a VirtRegMap, with calls to assignVirt2Phys, assignVirt2StackSlot, etc. and a call to a "spiller" at the end.

As a warm-up exercise (before implementing register allocation via graph coloring) I'm trying to implement a very simple scheme in which NO pseudo-registers are allocated to physical registers across instructions. So for each virtual register I call assignVirt2StackSlot. I thought that I should call assignVirt2Phys as well for each virtual register in an instruction, mapping that vreg to an unused preg. For example, for an instruction like this:
   %reg1024<def> = MOV32rr %ESP
I assign virtual register 1024 to some available physical register in the appropriate class so that the generated code will assign to that physical register. (I also assign virtual register 1024 to a stack slot so that the value assigned to the physical register is spilled to that slot.)

If a virtual register is used in an instruction, rather than defined, I again assign it to a physical register preg (but not to a stack slot because it will already have been assigned a slot) so that the generated code will load from the stack into that preg.

However, if the same virtual register is used in two different instructions, I get an abort with an error message saying that I'm trying to assign a physical register to a virtual register that's already been mapped. If I don't call assignVirt2Phys then I get an abort with an error message saying that some virtual register was not assigned a physical register.

I've tried calling addSpillPoint and addRestorePoint for every instruction that involves a virtual register, and also not calling those functions, and I still get the same aborts.

I would very much appreciate any help!

Susan Horwitz

Hi Susan,

However, if the same virtual register is used in two different instructions, I get an abort with an error message saying that I'm trying to assign a physical register to a virtual register that's already been mapped. If I don't call assignVirt2Phys then I get an abort with an error message saying that some virtual register was not assigned a physical register.

Each vreg should be mapped to a singe physreg. The rewriter will
rewrite any operand that refers to the virtreg to use the physreg it
is mapped to (including the operands of the loads/stores inserted for
that virtreg). If no mapping is made, or you attempt to redefine the
mapping (without calling VirtRegMap::clearAllVirt()), you'll get
assertion failures as you saw.

You really want to iterate over the function assigning a physreg to
each virtreg the first time you encounter it, being careful not to
assign the same physreg to two virtual registers if they're ever used
in the same instruction.

Unfortunately this simple approach will not work in all cases. If, for
instance, a virtreg were used in a set of instructions which also
used, between them, every possible physreg, then there'd be no physreg
left to allocate to the virtreg.

You can correct this (at some cost to performance) by inserting new
virtregs (see MachineRegisterInfo::createVirtualRegister), and copies
(see TargetInstrInfo::copyRegToReg) before allocation. You could do
the following:

For each instruction i:
For each operand o:
if o is a register operand:
create a new virtreg v with the same register class as o
if i uses o:
insert a copy from o to v before i
if i defines o:
insert a copy from v to o after i
      rename o to use v

I believe this should be sufficient to guarantee that you can find a
physreg for every virtreg which does not conflict with the choices for
any other virtreg in that instruction. You would only need to spill
the original vregs, not the ones you inserted in the pre-pass.

One gotcha - Two address instructions. If an operand is used as both
def and use in a two address instruction (see
MachineInstr::isRegTiedToUseOperand &
MachineInstr::isRegTiedToDefOperand) you only want to insert one new
vreg for the two operands and re-use it.

I've tried calling addSpillPoint and addRestorePoint for every instruction that involves a virtual register, and also not calling those functions, and I still get the same aborts.

You definitely need these calls. They're not the source of your trouble.

Cheers,
Lang.