Splitting live ranges of half-defined registers

I have already written about something similar (either on the list, or in private communication), so this may look familiar.

Here's a scenario I'm observing:

First, we have some innocent looking code:
vreg(32) = x // vreg(32) = 32-bit register
... = vreg(32)
[...]
vreg(64).low_half = vreg(32) // vreg(64) = 64-bit register
[...]

then, after register coalescing:
vreg(64).low_half = x
... = vreg(64).low_half
[...]
[etc.]

then, the live range of the 64-bit vreg is split during register allocation:
vreg'(64).low_half = x // first live range starts
... = vreg'(64).low_half
vreg"(64) = vreg'(64) // next live range starts
[...]

The problem is that vreg' only has half of it defined, and so the assignment vreg" = vreg' copies a register which has half of it undefined.

My question is: is this something that was a part of the design?

The problem may arise if someone, for whatever reason, decides to rewrite the 64-bit copy as two 32-bit copies, e.g.
vreg"(32).low_half = vreg'(32).low_half (1)
vreg"(32).high_half = vreg'(32).high_half (2)
When this happens, the assignment (2) basically reads an undefined register.

A scenario like this happens in real life. Look at vreg304 below:

BB#2: derived from LLVM BB %if.end
Predecessors according to CFG: BB#1
%vreg61<def> = LDrih_indexed %vreg56, 3134; IntRegs:%vreg61,%vreg56
%vreg62<def> = LDriuh_indexed %vreg56, 680; IntRegs:%vreg62,%vreg56
%R1<def> = TFRI 1431655766
ADJCALLSTACKDOWN 0, %R29<imp-def>, %R30<imp-def>, %R31<imp-use>, %R30<imp-use>, %R29<imp-use>
%vreg519<def> = TFRI 3148; IntRegs:%vreg519
%vreg523<def> = TFRI64 15; DoubleRegs:%vreg523
%R0<def> = COPY %vreg304:subreg_loreg; DoubleRegs:%vreg304

Then, after live range splitting, we have vreg539 and vreg537, and right before the call, a copy from half-undefined vreg539:

BB#2: derived from LLVM BB %if.end
Predecessors according to CFG: BB#1
%vreg61<def> = LDrih_indexed %vreg56, 3134; IntRegs:%vreg61,%vreg56
%vreg62<def> = LDriuh_indexed %vreg56, 680; IntRegs:%vreg62,%vreg56
%R1<def> = TFRI 1431655766
ADJCALLSTACKDOWN 0, %R29<imp-def>, %R30<imp-def>, %R31<imp-use>, %R30<imp-use>, %R29<imp-use>
%vreg519<def> = TFRI 3148; IntRegs:%vreg519
%vreg523<def> = TFRI64 15; DoubleRegs:%vreg523
%R0<def> = COPY %vreg539:subreg_loreg; DoubleRegs:%vreg539
%vreg537<def> = COPY %vreg539; DoubleRegs:%vreg537,%vreg539

-Krzysztof

Yes, the register allocator only deals in full-width virtual registers, so any copies or spills created will operate on the full register.

The coalescer can do some tricks by tracking partially defined registers, more so in LLVM 3.2 than earlier releases.

I would like to make it possible to track independent live ranges for each lane of a vector register. That would help both coalescing and RA, but it is a major project.

/jakob

I see.

Unfortunately, this is causing some customer code to fail in compilation. The direct cause of the compilation failure is a complaint from the register scavenger in the scenario that I described in the first email.

This is actually very much related to the other problem I've reported a while ago ("wrong value out of predecessor")---splitting (and spilling) of partially defined registers was also key to the failure occurring.

I can deal with this locally, so it's not a major blocker for us.

Thanks,
-Krzysztof

It shouldn't be causing any compile time failures, the VirtRegRewriter is adding <imp-def> operands for the wide register to make it look like it is live everywhere the virtual register was live:

vreg(64).low_half = vreg(32) // vreg(64) = 64-bit register

Should be rewritten as:

physreg(32) = other-physreg(32), physreg(64)<imp-def>

In the worst case, you should get a 64-bit copy instruction where a 32-bit copy would have been sufficient.

It is quite common for post-RA passes to get the <imp-def> operands wrong. It's really hard to get it right.

The machine code verifier (llc -verify-machineinstrs) can find these errors.

/jakob

As I mentioned off-mailing-list, for us this is often a lot worse than having a wider instruction. To give the context for the list audience---the problem is that if the 32-bit subregisters (which are actually independent registers) are used as a register pair, then in instructions in that live range, these 32-bit registers will be aliased to the super-register (the register pair). This, in turn, will cause false-dependencies between instructions that only access the (non-overlapping) 32-bit portions.

For us, tracking of individual lanes would solve this issue, and I believe that it would lead to a better code quality in general. I am very interested in getting this effort going.

-Krzysztof