"Ran out of registers during register allocation" bug affecting ffmpeg

See http://llvm.org/bugs/show_bug.cgi?id=4668 and
http://llvm.org/bugs/show_bug.cgi?id=5010. The basic description of
the issue (from http://llvm.org/bugs/show_bug.cgi?id=4668#c5): "The
fundamental problem is we can't spill a register once it's fixed to a
physical register."

From discussion on IRC:

[17:14] <_sabre_> efriedma: sounds like a RA bug in linscan
[17:14] <_sabre_> unfortunately, it is a fairly well known bug that is
difficult to fix
[17:14] <_sabre_> it would be worth bringing up on llvmdev, there may
be a good approach since I stopped looking at the regalloc
[17:14] <_sabre_> in any case, its worth surfacing the issue so that
Jakob can keep it in mind as he's building the new RA infrastructure

-Eli

I am not sure I completely understand the problem here since the PRs don't reproduce on TOT, but it sounds like a physical register is live across an inline asm that needs all the registers.

This can happen if the register coalescer decides to coalesce a physreg with a virtreg that is live across the inline asm. This is not easy to detect without a bad compile time regression.

I suppose something similar could happen for calls. For instance, some calls clobber all XMM registers, so a physical XMM register coalesced to be live across a call would be unspillable.

What can I say? Physreg coalescing is evil :wink:

I want to remove physreg coalescing entirely, but it requires the register allocator to be really good at taking hints. We are not quite there yet.

A quick fix would be to disable physreg coalescing for functions containing inline asm.

/jakob

See http://llvm.org/bugs/show_bug.cgi?id=4668 and
http://llvm.org/bugs/show_bug.cgi?id=5010. The basic description of
the issue (from http://llvm.org/bugs/show_bug.cgi?id=4668#c5): "The
fundamental problem is we can't spill a register once it's fixed to a
physical register."

From discussion on IRC:

[17:14] <_sabre_> efriedma: sounds like a RA bug in linscan
[17:14] <_sabre_> unfortunately, it is a fairly well known bug that is
difficult to fix
[17:14] <_sabre_> it would be worth bringing up on llvmdev, there may
be a good approach since I stopped looking at the regalloc
[17:14] <_sabre_> in any case, its worth surfacing the issue so that
Jakob can keep it in mind as he's building the new RA infrastructure

I am not sure I completely understand the problem here since the PRs don't reproduce on TOT, but it sounds like a physical register is live across an inline asm that needs all the registers.

I've uploaded a re-reduced testcase to PR4668; your guess appears to
be correct, or at least close.

This can happen if the register coalescer decides to coalesce a physreg with a virtreg that is live across the inline asm. This is not easy to detect without a bad compile time regression.

I suppose something similar could happen for calls. For instance, some calls clobber all XMM registers, so a physical XMM register coalesced to be live across a call would be unspillable.

What can I say? Physreg coalescing is evil :wink:

I want to remove physreg coalescing entirely, but it requires the register allocator to be really good at taking hints. We are not quite there yet.

I measure a 1.7% increase in code-size compiling gcc with
-disable-physical-join vs. normal compilation on x86-64. That's
pretty substantial. Looking over the generated code, it looks like
we're missing a lot of cases which seem like they should be easy, like
not putting an immediately returned PHI node into eax, or calculating
a value and immediately moving it into another register for a call.
The difference isn't so bad on x86-32, only 0.25%. I think that means
the primary issue with using -disable-physical-join is that we're
doing a really bad job of putting the arguments to calls in the right
register.

Is the issue that the correct hints aren't there, or that the
allocation algorithm isn't using them well?

A quick fix would be to disable physreg coalescing for functions containing inline asm.

On x86-32, that probably wouldn't be such a big deal, but the effect
looks really bad for x86-64, and probably other architectures that
pass arguments in registers.

-Eli

I've noticed this as well, when trying to compare output between MSVC and clang. MSVC seems to do a pretty good job of not wasting any registers. A side benefit is that MSVC seems to spill less registers to the stack...

I've uploaded a re-reduced testcase to PR4668

Thanks!

[...]

I want to remove physreg coalescing entirely, but it requires the register allocator to be really good at taking hints. We are not quite there yet.

I measure a 1.7% increase in code-size compiling gcc with
-disable-physical-join vs. normal compilation on x86-64. That's
pretty substantial.

Yes. There are also some runtime performance regressions.

Looking over the generated code, it looks like
we're missing a lot of cases which seem like they should be easy, like
not putting an immediately returned PHI node into eax, or calculating
a value and immediately moving it into another register for a call.

Interesting.

When spilling a virtual register, a bunch of new registers are created for all the uses. These registers don't currently get hints, but they could. See the bottom of SplitEditor::rewrite() in SplitKit.cpp.

The difference isn't so bad on x86-32, only 0.25%. I think that means
the primary issue with using -disable-physical-join is that we're
doing a really bad job of putting the arguments to calls in the right
register.

Yes, function arguments and return registers are the primary source of copies to and from physregs. It makes sense that x86 has less.

Is the issue that the correct hints aren't there, or that the
allocation algorithm isn't using them well?

The latter. I moved hinting to CalcSpillWeights.cpp and all relevant registers should be getting good hints.

The RA uses the hint as a first guess, but it doesn't try to avoid allocating registers that are used as hints for interfering virtual registers. The hints are also cleared when backtracking.

A quick fix would be to disable physreg coalescing for functions containing inline asm.

On x86-32, that probably wouldn't be such a big deal, but the effect
looks really bad for x86-64, and probably other architectures that
pass arguments in registers.

I agree.

/jakob