Subregister liveness tracking

Currently it will always spill / restore the whole vreg but only spilling the parts that are actually live would be a nice addition in the future.

Looking at r192119': if "mtlo" writes to $LO and sets $HI to an unpredictable value, then it should just have an additional (dead) def operand for $hi, shouldn't it?

Greetings
     Matthias

What I didn’t mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn’t have this side-effect if it is produced by another mthi/lo. So I don’t think making mthi/lo clobber the other half would work.

For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:

  1. mult $lo, $hi, $2, $3 // $lo, $hi = $2 * $3

  2. mflo $4, $lo // $4 ← $lo

  3. mtlo $lo, $6 // $lo ← $6. effectively clobbers $hi too.

  4. mfhi $5, $hi // $5 ← $hi

  5. mthi $hi, $7 // $hi ← $7

  6. madd $lo, $hi, $8, $9, $lo, $hi // $lo, $hi = $2 * $3 + (lo,hi)

Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:

  1. mult $lo, $hi, $2, $3 // $lo, $hi = $2 * $3

  2. mflo $4, $lo // $4 ← $lo

  3. mfhi $5, $hi // $5 ← $hi

  4. mthi $hi, $7 // $hi ← $7.

  5. mtlo $lo, $6 // $lo ← $6. This does not clobber $hi.

  6. madd $lo, $hi, $8, $9, $lo, $hi // $lo, $hi = $2 * $3 + (lo,hi)

Probably I can define a pseudo instruction “mthilo” that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint ‘l’ for ‘lo’ register).

What I didn’t mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn’t have this side-effect if it is produced by another mthi/lo. So I don’t think making mthi/lo clobber the other half would work.

Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think.

For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable:

  1. mult $lo, $hi, $2, $3 // $lo, $hi = $2 * $3

  2. mflo $4, $lo // $4 ← $lo

  3. mtlo $lo, $6 // $lo ← $6. effectively clobbers $hi too.

  4. mfhi $5, $hi // $5 ← $hi

  5. mthi $hi, $7 // $hi ← $7

  6. madd $lo, $hi, $8, $9, $lo, $hi // $lo, $hi = $2 * $3 + (lo,hi)

Unlike the mtlo instruction in the example above, instruction 5 in the next example does not clobber $hi:

  1. mult $lo, $hi, $2, $3 // $lo, $hi = $2 * $3

  2. mflo $4, $lo // $4 ← $lo

  3. mfhi $5, $hi // $5 ← $hi

  4. mthi $hi, $7 // $hi ← $7.

  5. mtlo $lo, $6 // $lo ← $6. This does not clobber $hi.

  6. madd $lo, $hi, $8, $9, $lo, $hi // $lo, $hi = $2 * $3 + (lo,hi)

Probably I can define a pseudo instruction “mthilo” that defines both lo and hi and expands to mthi and mtlo after register allocation, which will force register allocator to spill/restore the whole register in most cases (the only exception I can think of is the inline-assembly constraint ‘l’ for ‘lo’ register).

That is probably the cleanest solution, with the only downside being that the scheduler can’t place instruction between the mthi and mtlo anymore.

Greetings
Matthias

Hi all,

I’m working on LLVM-based back-end. We have custom LLVM solution for sub-register liveness tracking.

Recently we tried to replace it with LLVM enableSubRegLiveness and run into multiple issues.
I can’t share specific IR snippets now, but failures are related to spilling, undef sub-registers, etc.

There is Bug 17557 open since 2013: Enable subregister liveness for scheduling and register allocation.
https://llvm.org/bugs/show_bug.cgi?id=17557

with TODO list:

"These are the remaining steps to get Matthias' subregister liveness fully integrated:
- Fix LiveRegUnits to correctly handle regmasks.
- Benchmark/tune compile time.
- Enable subreg liveness on x86 for testing purposes.
- Use LiveRegUnits to fix ARM VMOV widening.
- Fix the scheduler's DAG builder to use bundler iterator, not operand index.
- Discard the master live range after coalescing so that LiveInterval updates don't need to preserve it when we reorder subregister defs.
- Enable subreg scheduling on all targets that enable MI scheduler."

Is someone still working on bug fixes and enhancements? any pending patches?

R600 back-end is using sub-reg liveness: e.g. r238999 - R600: Re-enable sub-reg liveness (June 2015).
But it seems requirements and use cases are GPU-specific.

Does anyone use sub-reg liveness for RISC/CISC+SIMD targets?

Thanks,

Sergey

Oh that is an old ticket, the items in the TODO below are all done (some are obsolete).

subregister liveness tracking has been available in llvm for roughly a year now and for a few months we also have scheduler extensions in place to allow independent scheduling of subregister definitions.

While I had subregister liveness tracking working for all CPU targets I decided against enabling it in the end. This is because I could not measure any benefits in the generated code, so I couldn’t justify the added compiletime.

We do have subregister liveness tracking enabled in production for AMDGPU and an out of tree target, so it certainly works today.

  • Matthias

Hi Matthias,

Thanks for quick reply. I have few questions about sub-register support.

I’m currently using LLVM 3.8. Could you send links to recent patches/reviews?

Given the time the 3.8 branch was created it should definitely have all the subregister liveness tracking code, it may not have all the things necessary to enable the scheduler enhancements though.

  1. Does LLVM maintain sub-register liveness while splitting live intervals?

No, currently subregister liveness information is dropped if registers get split or spilled.
There are currently some issues with recalculating liveness after invasive changes relating to the fact that you cannot set undef/dead flags on subregister granularity in todays llvm. There is no such problem when the liveness information is computed before register coalescing and the coalescer itself has code to update the liveness information rather than recomputing it.
The spilling and splitting code currently takes the easy way out and drops the subregister information (so you are only left with the liveness of the superregister). This hasn’t been a problem for GPUs as those typically have many register so spilling and splitting are untypical operations.

  1. How sub-register kill/dead flags and BB liveins should look after regalloc?

The kill and the dead flag describes the whole machine operand:
%vreg0. // dead def of the whole vreg0
%vreg1.sub1. // dead def of the sub1 subregisters (the other subregs may be live)
= %vreg2. // undef %vreg2 usage <=> the operand does not affect vreg2s liveness
= %vreg2.sub2. // undef %vreg2.sub1 usage

%AL = xxx
= use %EAX. // this is legal: you only need part of the physreg to be defined for a use to be fine
= use %EBX // this still requires the undef flag as the complete register is undefined.

E.g. target has instructions both for sub-registers and registers.
Composite 64b reg contains 2x32b regs: Rxy = { Rx, Ry }. and scalar/vector add/ld/etc.

LiveRangeCalc::findReachingDefs expects to see all sub-registers in BB liveins together with pair registers.

BB1:
Rxy = …

BB2: LiveIn: Rxy and Rx ?

You can either have Rxy in the live-in list, you may also have Rx and Ry separately in the list.
(Note that each live-in also has a lanemask assigned so you can have situations in which Rxy is in the live-in list
but the lanemask is telling you that just one of Rx/Ry is actually live)

… = ADD Rx

… = VADD Rxy

00284     if ([TargetRegisterInfo::isPhysicalRegister](http://llvm.org/docs/doxygen/html/classllvm_1_1TargetRegisterInfo.html#a055858b14215864ed367a8db6c19d6f6)(PhysReg) &&
00285         !MBB->[isLiveIn](http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#afe4193a0ecb73443df8c573bb29bd476)(PhysReg)) {
00286       MBB->[getParent](http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#af2e482ff2a9253ec6bc2285491496bd6)()->[verify](http://llvm.org/docs/doxygen/html/classllvm_1_1MachineFunction.html#a1e613e97a2629a51e5d2d3e6f7b32b50)();
00287       [errs](http://llvm.org/docs/doxygen/html/namespacellvm.html#ab8e34eca3b0817ef7a127913fbf6d9e4)() << "The register " << [PrintReg](http://llvm.org/docs/doxygen/html/namespacellvm.html#a28f4a9f931a245d69d411d73e5a877a9)(PhysReg)
00288              << " needs to be live in to BB#" << MBB->[getNumber](http://llvm.org/docs/doxygen/html/classllvm_1_1MachineBasicBlock.html#a6acda287e5c19ffb173b0bf8f1dd9c5e)()
00289              << ", but is missing from the live-in list.\n";
00290       [llvm_unreachable](http://llvm.org/docs/doxygen/html/Support_2ErrorHandling_8h.html#ace243f5c25697a1107cce46626b3dc94)("Invalid global physical register");
00291     }

You may be running in the recalculation problems here. Tricky situations may look like this:

BB0:
%vreg0 = xxx
jmp BB2

BB1:
%vreg0.sub1 = yyy
jmp BB2

BB2:
= use %vreg0. // the instruction only cares about vreg0.sub1 but for some reason encodes the full register.
// However as we lack a way to add an “undef” flag for vreg0.sub0 the liveness calculation traces
// backwards and fails to find an actual sub0 definition in BB1.

If that is your actual problem, there is more discussion and a possible solution being discussed in https://reviews.llvm.org/D21189. I must say though that I don’t feel comfortable with the complexity added to the code there…

  • Matthias

I have a different version of isDefOnEntry coming that should reduce the actual complexity. I have run into an unrelated problem though and I'm getting that fixed now...

-Krzysztof