LLVM ERROR: ran out of registers during register allocation

Hello,

I’m getting the “LLVM ERROR: ran out of registers during register allocation” error message for an out of tree target I’m developing. This is happening for the following piece of C code:

struct ss
{
int a;
int b;
int c;
};
void loop(struct ss *x, struct ss **y, int z)
{
int i;
for (i=0; i<z; ++i)
{
x->c += y[i]->b;
}
}

The problem relies in the register classes for the load and store with displacement instructions:

  • load DREGS:$dst, PTR:$p
  • store DREGS:$src, PTR:$p
    where DREGS is a regclass composed of 16 registers and PTR is a regclass composed of 2 registers, but the key point here is that PTR is a subset of DREGS, so cross class copies are perfectly allowed.
    One of the 2 registers in the PTR regclass is reserved for the frame pointer, leaving only one register to access both x and y[i] in the code above, which seems to confuse the regalloc. Implementing getLargestLegalSuperClass() sort of helped for simpler functions, but not for this one.
    This code can be register-allocted if the addresses of both x and y[i] are cross class copied to DREGS registers to temporarily store these values and then when required copied back to the the register in PTR to access them in memory.

Any help for a solution or a workaround will be greatly appreciated since this problem is severily limiting the complexity of the programs that I can currently compile.

Thanks.

Those are some severe constraints on register allocation, but it ought to be possible anyway.

You may wan't to investigate how RAGreedy::canEvictInterference() is behaving.

/jakob

Hello Jakob,

Those are some severe constraints on register allocation, but it ought to be possible anyway.

Indeed, these constraints aren’t playing very well with the register allocator :\

You may wan’t to investigate how RAGreedy::canEvictInterference() is behaving.

Ok, this is what I’ve noticed, not sure if it makes sense at all but, regalloc fails with the following sequence:

  1. directly assign the physreg in PTR RC to a virtX.
  2. for a virtY which also belongs to the PTR RC, try to evict: call canEvictInterference() for virtY which interferes with virtX, returns true. evict and unassign virtX and assign physreg to virtY.
  3. for a virtZ which also belongs to the PTR RC, try to evict: call canEvictInterference() for virtZ which interferes with virtY, both VirtReg.isSpillable() and Intf->isSpillable() return false, can’t evict, wait for a second round and queue new interval.
  4. do some work unrelated to these vregs.
  5. when selectOrSplit is called again for virtZ it falls through down to the return ~0u line and fails.

This issue can be very easily reproduced with the Thumb2 target by doing the following few changes:

  1. declare a PTRRC regclass in ARMRegisterInfo.td with only one physreg:
    def PTRRC : RegisterClass<“ARM”, [i32], 32, (add R6)>;

  2. modify the RC used in the addr_offset_none addressing mode in ARMInstrInfo.td around line 947 to:
    let MIOperandInfo = (ops PTRRC:$base);
    (this is used by the t2LDR_POST instruction)

  3. and likewise modify the t2addrmode_imm12 addressing mode in ARMInstrThumb2.td around line 151 to:
    let MIOperandInfo = (ops PTRRC:$base, i32imm:$offsimm);
    (used by the load/store instructions)

then compile with -O3 and done :slight_smile:

In addition, I’ve attached the debugging info generated by the regalloc for the Thumb2 target. The main difference of the debug output using my target is that I didn’t get any spill code like Thumb2 has, probably because i have far more free regs available.

Thanks for your help.

dbg (128 KB)

Hello Jakob,

I think I’ve found something interesting that may help you get a better idea of what’s going on.

While looking at the debug info I noticed that the coalescer was removing lots of copies that could help the allocator make more cross class copies. As a test, I disabled the join-liveintervals option in the coalescer which gave me the surprise of making the regalloc succeed. To show what I mean, I’m attaching two text files that show the debug info generated when this option is enabled and disabled for my target.

To push things a bit further, I wrote a dirty hack in RegisterCoalescer::joinCopy() to return false when the dest regclass is too constrained. This allowed me executing the coalescer without crashing the regalloc. Obviously the generated code is not optimal at all, because there are many useless copies around. I’m pretty sure this is not the right fix at all, but it can give you a hint incase the problem is in the coalescer and not in the regalloc.

Thanks!

2012/12/18 Borja Ferrer <borja.ferav@gmail.com>

dump.zip (16.6 KB)

We did something like this back when the register allocator couldn't split live ranges.

The problem is that any heuristic you can come up with only makes the problem less likely to happen. It doesn't actually fix it.

/jakob

We did something like this back when the register allocator couldn’t split live ranges.

Yes, I remember the isWinToJoinCrossClass() function, removed here:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp?r1=152016&r2=155551&diff_format=h
that prevented some coalescing to the cost of leaving many unnecessary copies around for very constrained regclasses like the one I have.

The problem is that any heuristic you can come up with only makes the problem less likely to happen. It doesn’t actually fix it.

Indeed, that heuristic I wrote is a nasty hack and not the way of fixing it correctly. Now that the regalloc has much more freedom on making changes in the code I hope it is fixable. If you need any other info apart of the dumps I’ve already attached please ask.

Thanks!

Hello Jakob,

Did you get a chance to take a look into this, and if not, can you do it when you get some spare time?

Thanks!

2012/12/19 Borja Ferrer <borja.ferav@gmail.com>

Hello Jakob,

Did you get a chance to take a look into this, and if not, can you do it when you get some spare time?

It’s not likely I’ll have time to look at this in the near future. I’d recommend you do it yourself.

/jakob

Ok, I’ve found that marking tiny live intervals as not spillable inside VirtRegAuxInfo::CalculateWeightAndHint is not playing nicely with very constrained regclasses, in my case a regclass composed of only one register.
As a workaround, instead of marking them as not spillable, I’ve marked them with a very high spill cost and the regalloc is able to compile the function with good code quality. To avoid doing this for all live intervals of this regclass, I’m filtering only the ones that dont have a small size (returned by li.getSize())

Does this possible cause make any sense at all?

In either case, in the meantime, I can live with this workaround until an official fix is implemented. I’ll fill in a bug report to track this problem so you can take a look at it when appropiate.

2013/1/7 Jakob Stoklund Olesen <stoklund@2pi.dk>

Ok, I've found that marking tiny live intervals as not spillable inside VirtRegAuxInfo::CalculateWeightAndHint is not playing nicely with very constrained regclasses, in my case a regclass composed of only one register.
As a workaround, instead of marking them as not spillable, I've marked them with a very high spill cost and the regalloc is able to compile the function with good code quality. To avoid doing this for all live intervals of this regclass, I'm filtering only the ones that dont have a small size (returned by li.getSize())

Does this possible cause make any sense at all?

Yes, that sounds like a workable fix.

It is important that live ranges coming out of the spiller are unspillable, but the zero length intervals don't need an infinite spill weight - very very large is good enough.

In either case, in the meantime, I can live with this workaround until an official fix is implemented. I'll fill in a bug report to track this problem so you can take a look at it when appropiate.

Thanks.

/jakob

Here is the bug report: http://llvm.org/bugs/show_bug.cgi?id=14879

Thanks for your help Jakob!

2013/1/9 Jakob Stoklund Olesen <stoklund@2pi.dk>