Greedy register allocator allocates live sub-register

Hi all,

I’ve come across a problem with register allocation which I have been unable to track down the root cause of.

6728B %vreg304 = COPY %vreg278; VRF128:%vreg304,%vreg278
6736B %vreg302 = COPY %vreg278; VRF128:%vreg302,%vreg278
6752B %vreg278<def,tied1> = foo %vreg278, %vreg277, 14, pred:1, pred:%noreg, 5; VRF128:%vreg278 VRF64_l:%vreg277

  • bar 30, %vreg278; VRF128:%vreg278
    6760B %vreg302<def,tied1> = foo %vreg302, %vreg270, 14, pred:1, pred:%noreg, 5; VRF128:%vreg302 VRF64_l:%vreg270
  • bar 30, %vreg302; VRF128:%vreg302
    6768B %vreg304<def,tied1> = foo %vreg304, %vreg263, 14, pred:1, pred:%noreg, 5; VRF128:%vreg304 VRF64_l:%vreg263
  • bar 30, %vreg304; VRF128:%vreg304

6776B STORE128 %vreg302, <fi#32>, 0; mem:ST16[FixedStack32] VRF128:%vreg302
6792B %vreg306 = COPY %vreg305; VRF128:%vreg306,%vreg305
6796B %vreg375 = LOAD_v4i16 <fi#64>, 0, pred:1, pred:%noreg, 7; mem:LD8[FixedStack64] VRF64_l:%vreg375
6800B %vreg306<def,tied1> = foo %vreg306, %vreg375, 14, pred:1, pred:%noreg, 5; VRF128:%vreg306 VRF64_l:%vreg375

  • bar 30, %vreg306; VRF128:%vreg306
    6804B STORE128 %vreg304, <fi#33>, 0; mem:ST16[FixedStack33] VRF128:%vreg304

For this sequence of instructions, when allocating a register for %vreg375 the greedy register allocator chooses V15_l. The problem here is that it had previously allocated V15 (V15_l is a sub-register of V15) to %vreg304. %vreg304 is defined at 6768B and finally used at 6804B so the instruction LOAD_v4i16 at 6796B ends up clobbering the value in V15 before its last use. This is the output of the allocator itself:

selectOrSplit VRF64_l:%vreg375 [6796r,6800r:0) 0@6796r w=1.#INF00e+00
assigning %vreg375 to %V15_l: V15_e0 [6796r,6800r:0) 0@6796r V15_e1 [6796r,6800r:0) 0@6796r V15_e2 [6796r,6800r:0) 0@6796r V15_e3 [6796r,6800r:0) 0@6796r

And this is the live range of %vreg304 printed just before the allocation of %vreg375 started:

%vreg304 [6728r,6768r:0)[6800r,6804r:1) 0@6728r 1@6800r

Why is V15_l being allocated to %vreg375 when there is already a live value in V15? Am I missing some meta-data on either of the instructions foo or bar? The instruction bar doesn’t clobber %vreg304.

Thanks,
Stephen

Hi,

Hi all,

I've come across a problem with register allocation which I have been
unable to track down the root cause of.

6728B%vreg304<def> = COPY %vreg278; VRF128:%vreg304,%vreg278
6736B%vreg302<def> = COPY %vreg278; VRF128:%vreg302,%vreg278
6752B%vreg278<def,tied1> = foo %vreg278<tied0>, %vreg277, 14, pred:1,
pred:%noreg, 5; VRF128:%vreg278 VRF64_l:%vreg277
  * bar 30, %vreg278; VRF128:%vreg278
6760B%vreg302<def,tied1> = foo %vreg302<tied0>, %vreg270, 14, pred:1,
pred:%noreg, 5; VRF128:%vreg302 VRF64_l:%vreg270
  * bar 30, %vreg302; VRF128:%vreg302
6768B%vreg304<def,tied1> = foo %vreg304<tied0>, %vreg263, 14, pred:1,
pred:%noreg, 5; VRF128:%vreg304 VRF64_l:%vreg263
  * bar 30, %vreg304; VRF128:%vreg304
6776BSTORE128 %vreg302, <fi#32>, 0; mem:ST16[FixedStack32] VRF128:%vreg302
6792B%vreg306<def> = COPY %vreg305; VRF128:%vreg306,%vreg305
6796B%vreg375<def> = LOAD_v4i16 <fi#64>, 0, pred:1, pred:%noreg, 7;
mem:LD8[FixedStack64] VRF64_l:%vreg375
6800B%vreg306<def,tied1> = foo %vreg306<tied0>, %vreg375, 14, pred:1,
pred:%noreg, 5; VRF128:%vreg306 VRF64_l:%vreg375
  * bar 30, %vreg306; VRF128:%vreg306
6804BSTORE128 %vreg304, <fi#33>, 0; mem:ST16[FixedStack33] VRF128:%vreg304

For this sequence of instructions, when allocating a register for
%vreg375 the greedy register allocator chooses V15_l. The problem here
is that it had previously allocated V15 (V15_l is a sub-register of V15)
to %vreg304. %vreg304 is defined at 6768B and finally used at 6804B so
the instruction LOAD_v4i16 at 6796B ends up clobbering the value in
V15before its last use. This is the output of the allocator itself:

selectOrSplit VRF64_l:%vreg375 [6796r,6800r:0) 0@6796r w=1.#INF00e+00
assigning %vreg375 to %V15_l: V15_e0 [6796r,6800r:0) 0@6796r V15_e1
[6796r,6800r:0) 0@6796r V15_e2 [6796r,6800r:0) 0@6796r V15_e3
[6796r,6800r:0) 0@6796r

And this is the live range of %vreg304 printed just before the
allocation of %vreg375 started:

I've no idea what the problem is but this looks a little funny to me:

%vreg304 [6728r,6768r:0)[6800r,6804r:1) 0@6728r 1@6800r

According to this live range (if I can interpret the printouts...), %vreg304 is not live at e.g. 6796B, and thus I suppose V15 is free for %vreg375 at that point?

Has %vreg304 been split/spilled before the allocation of %vreg375, so the code at that point is actually something different than all the instructions you posted above?

Regards,
Mikael

Hi Mikael,

I think you’re right, the live range does seem to suggest that %vreg304 isn’t live at 6796B. However from manual inspection of the code, I think that %vreg304 should be marked as live at that point.

There were several spills and reloads inserted before the allocation of %vreg375. This made it very difficult to track what exactly was going on, so in the main while loop of “RegAllocBase::allocatePhysRegs” I added a call to “LIS->dump()” to print out the code for the whole function. I got the code I posted from the dump just before “RAGreedy::selectOrSplit” was called for %vreg375. I think this means the code I posted is correct at the time of allocation (unless some other changes are made to the code by “RAGreedy::selectOrSplit” itself).

Thanks,
Stephen

I’ve managed to resolve this issue. It was related to how the live intervals were being updated during pre-RA scheduling. When an instruction is moved during pre-RA scheduling we need to call the function “LiveIntervals::handleMove” to update the live intervals associated with that instruction. The trouble was that we were unable to do this for bundle header instructions. The reason for this is that there is an assertion in “handleMove” that the instruction cannot be bundled. I believe that this assertion is invalid, and should instead be that the instruction cannot be inside a bundle. This means that the assertion still blocks attempted calls with instructions inside bundles, but doesn’t prevent updating the bundle header. The live intervals for instructions inside the bundle should be updated at the same time as the bundle header. I’ve created a patch to this effect and have submitted it to llvm-commits for review.

Thanks,

Stephen