Help with definition of subregisters; spill, rematerialization and implicit uses

Hi,

I have a problem regarding sub-register definitions and LiveIntervals on
our target. When a subregister is defined, other parts of the register
are always left untouched - they are neither read or def:ed.

It however seems that Codegen treats subregister definitions as somehow
clobbering the whole register.

The SSA-code looks like this after isel:

(Reg0 and Reg1 are 16bit registers. Reg2, Reg3 and Reg4 are 32 bit
registers with 16bit subregs, hi16 and lo16.)

Reg0 = #imm0
Reg1 = #imm1

Reg2 = IMPLICIT_DEF
Reg3 = INSERT_SUBREG Reg2, Reg0, hi16
Reg4 = INSERT_SUBREG Reg3, Reg1, lo16

After TwoAddressInstructionPass it becomes:

Reg5:hi16<def,read-undef> = Reg0
Reg5:lo16<def> = Reg1

So, in my world this means a setting of the high 16 bits in Reg5 (not
affecting the low part) followed by a setting of the low 16 bits (not
affecting the high part). Is this how LLVM looks at it too?

(What does the "read-undef" part really mean, since in my
world, the setting of lo16 or hi16 does not in any way affect or
access the other part of the register.)

The question is: How should true subregister definitions be
expressed so that they do not interfere with each other? See the
detailed problem description below.

Hi Mikael,

Hi,

I have a problem regarding sub-register definitions and LiveIntervals on
our target. When a subregister is defined, other parts of the register
are always left untouched - they are neither read or def:ed.

It however seems that Codegen treats subregister definitions as somehow
clobbering the whole register.

The SSA-code looks like this after isel:

(Reg0 and Reg1 are 16bit registers. Reg2, Reg3 and Reg4 are 32 bit
registers with 16bit subregs, hi16 and lo16.)

Reg0 = #imm0
Reg1 = #imm1

Reg2 = IMPLICIT_DEF
Reg3 = INSERT_SUBREG Reg2, Reg0, hi16
Reg4 = INSERT_SUBREG Reg3, Reg1, lo16

After TwoAddressInstructionPass it becomes:

Reg5:hi16<def,read-undef> = Reg0
Reg5:lo16 = Reg1

So, in my world this means a setting of the high 16 bits in Reg5 (not
affecting the low part) followed by a setting of the low 16 bits (not
affecting the high part). Is this how LLVM looks at it too?

Yes, it is.

(What does the “read-undef” part really mean, since in my
world, the setting of lo16 or hi16 does not in any way affect or
access the other part of the register.)

read-undef means that we care only about the part we are defining.
Reg5:hi16<def,read-undef> means that whatever is set in Reg5, but hi16, we do not care.
Reg5:lo16 means that we do care about the value of Reg5:hi16.

Here is the exact definition:

/// IsUndef - True if this register operand reads an “undef” value, i.e. the
/// read value doesn’t matter. This flag can be set on both use and def
/// operands. On a sub-register def operand, it refers to the part of the
/// register that isn’t written. On a full-register def operand, it is a
/// noop. See readsReg().

The question is: How should true subregister definitions be
expressed so that they do not interfere with each other? See the
detailed problem description below.

We do have a limitation in our current liveness tracking for sub-register. Therefore, I am not sure that is possible.

Conceptually, what you want is:

Reg5:hi16<def,read-undef>
Reg5:lo16<def,read-undef>

However, I guess this wouldn’t be supported, because it would mean that we do not care about the value of hi16 at the definition of Reg5:lo16. This is true, but after this definition we do care about hi16 and I am afraid read-undef does not convey the right information for the subsequent uses of Reg5.

You can give it a try and see how it goes.


During RA it’s decided Reg5 should be spilled and it’s also decided Reg5
can be rematerialized:

“Value Reg5:0@5000r may remat from Reg5:hi16<def,read-undef> = mv_any16
32766”

So it says Reg5 can be rematerialized by setting it’s high part…

We also get:

reload: 5052r Reg5 = Load40FI <fi#2>
rewrite: 5056r Reg5:lo16 = mv_nimm6_ar16 0

So it inserts a reload of the full Reg5 prior to the setting of
Reg5:lo16, because it thinks there is an implicit use of Reg5 when
writing the low part??? This seems very weird to me.

The decision is based on the fact that MachineOperand::readsReg()
returns true:

/// readsReg - Returns true if this operand reads the previous value
of its
/// register. A use operand with the flag set doesn’t read its
/// register. A sub-register def implicitly reads the other parts of the
/// register being redefined unless the flag is set.
///
/// This refers to reading the register value from before the current
/// instruction or bundle. Internal bundle reads are not included.
bool readsReg() const {
assert(isReg() && “Wrong MachineOperand accessor”);
return !isUndef() && !isInternalRead() && (isUse() || getSubReg());
}

I don’t get why we automatically should get an implicit use just
because we are writing a subreg.

Since Reg5:lo16 is defined with

Reg5:lo16 = Reg1

isUndef() will return false and getSubReg() true, and thus readsReg()
true and the reload is inserted.

Then we get

*** Bad machine code: Instruction loads from dead spill slot ***

because the spill slot has not been written.

This is weird. Any chance you could share a test case?

Thanks,
-Quentin

Hi Quentin,

[...]

The question is: How should true subregister definitions be
expressed so that they do not interfere with each other? See the
detailed problem description below.

We do have a limitation in our current liveness tracking for
sub-register. Therefore, I am not sure that is possible.

Conceptually, what you want is:
Reg5:hi16<def,read-undef>
Reg5:lo16<def,read-undef>

However, I guess this wouldn’t be supported, because it would mean that
we do not care about the value of hi16 at the definition of Reg5:lo16.
This is true, but after this definition we do care about hi16 and I am
afraid read-undef does not convey the right information for the
subsequent uses of Reg5.

You can give it a try and see how it goes.

I tried setting isUndef to trie when handling INSERT_SUBREG in TwoAddressInstructioPass.cpp, but then I run into stuff like this instead:

832B %vreg50:hi16<def,read-undef> = COPY %vreg0
848B ...
864B %vreg19<def,dead> = COPY %vreg50
880B %vreg19:lo16<def,read-undef> = COPY %vreg73
896B ...
912B mv_a32_r16_rmod1 %vreg19, %vreg20

...

*** Bad machine code: Multiple connected components in live interval ***
- function: fixedconv
- interval: %vreg19 [864r,864d:0)[880r,1024r:1) 0@864r 1@880r
0: valnos 0
1: valnos 1

So here, both the setting of the hi16 and lo16 parts are marked with read-undef, as wanted. However

864B %vreg19<def,dead> = COPY %vreg50

looks suspicious to me. It's like it thinks that

880B %vreg19:lo16<def,read-undef> = COPY %vreg73

is redefining the whole vreg19, not only the lo16 part, and since this instruction has read-undef, it thinks no part of vreg19, not even hi16 is live over instruction 880.

isUndef() will return false and getSubReg() true, and thus readsReg()
true and the reload is inserted.

Then we get

*** Bad machine code: Instruction loads from dead spill slot ***

because the spill slot has not been written.

This is weird. Any chance you could share a test case?

Unfortunately not. I'm running our out-of-tree backend and I've no idea if anything like this ever happens in other backends :frowning:

Thanks!
/Mikael

Hi Mikael,

Hi Quentin,

[...]

The question is: How should true subregister definitions be
expressed so that they do not interfere with each other? See the
detailed problem description below.

We do have a limitation in our current liveness tracking for
sub-register. Therefore, I am not sure that is possible.

Conceptually, what you want is:
Reg5:hi16<def,read-undef>
Reg5:lo16<def,read-undef>

However, I guess this wouldn’t be supported, because it would mean that
we do not care about the value of hi16 at the definition of Reg5:lo16.
This is true, but after this definition we do care about hi16 and I am
afraid read-undef does not convey the right information for the
subsequent uses of Reg5.

You can give it a try and see how it goes.

I tried setting isUndef to trie when handling INSERT_SUBREG in TwoAddressInstructioPass.cpp, but then I run into stuff like this instead:

832B %vreg50:hi16<def,read-undef> = COPY %vreg0
848B ...
864B %vreg19<def,dead> = COPY %vreg50
880B %vreg19:lo16<def,read-undef> = COPY %vreg73
896B ...
912B mv_a32_r16_rmod1 %vreg19, %vreg20

...

*** Bad machine code: Multiple connected components in live interval ***
- function: fixedconv
- interval: %vreg19 [864r,864d:0)[880r,1024r:1) 0@864r 1@880r
0: valnos 0
1: valnos 1

So here, both the setting of the hi16 and lo16 parts are marked with read-undef, as wanted. However

864B %vreg19<def,dead> = COPY %vreg50

looks suspicious to me. It's like it thinks that

880B %vreg19:lo16<def,read-undef> = COPY %vreg73

is redefining the whole vreg19, not only the lo16 part, and since this instruction has read-undef, it thinks no part of vreg19, not even hi16 is live over instruction 880.

I was afraid something like this would happen.
Your interpretation is correct.

This confirms that the semantic of read-undef flag means that the others part of the register are undef, and thus nothing live through. The bottom line is currently you cannot represent what you want.

It seems that you will have to debug further the *** Bad machine code: Instruction loads from dead spill slot *** before we can be of any help.

Thanks,

-Quentin

Hi Quentin,

[...]

It seems that you will have to debug further the *** Bad machine code: Instruction loads from dead spill slot *** before we can be of any help.

Yes, I've done some more digging. Sorry for the long mail...

I get:

  Inline spilling aN40_0_7:%vreg1954 [5000r,5056r:0)[5056r,5348r:1) 0@5000r 1@5056r

At this point I have the following live ranges for vreg1954:

  %vreg1954 [5000r,5056r:0)[5056r,5348r:1) 0@5000r 1@5056r

And vreg1954 is mentioned in these instructions:

  5000B %vreg1954:hi16<def,read-undef> = mv_any16 32766
  [...]
  5048B mv_ar16_r16_rmod1 %vreg1954:hi16, %vreg1753
  5056B %vreg1954:lo16<def> = mv_nimm6_ar16 0
  5064B Store40FI %vreg1954, <fi#2>
  [...]
  5128B %vreg223<def> = COPY %vreg1954
  [...]
  5216B %vreg1178<def> = COPY %vreg1954
  [...]
  5348B %vreg1955<def> = COPY %vreg1954

Then it tries to rematerialize:

  Value %vreg1954:0@5000r may remat from %vreg1954:hi16<def,read-undef> = mv_any16 32766

And it examines all use points of vreg1954 to see if it can remat or not:

  cannot remat for 5128e %vreg223<def> = COPY %vreg1954
  cannot remat for 5216e %vreg1178<def> = COPY %vreg1954
  remat: 5044r %vreg1956:hi16<def,read-undef> = mv_any16 32766
          5048e mv_ar16_r16_rmod1 %vreg1956:hi16<kill>, %vreg1753

  cannot remat for 5064e Store40FI %vreg1954, <fi#2>
  cannot remat for 5348e %vreg1955<def> = COPY %vreg1954
  All defs dead: %vreg1954:hi16<def,read-undef,dead> = mv_any16 32766
  Remat created 1 dead defs.
  Deleting dead def 5000r %vreg1954:hi16<def,read-undef,dead> = mv_any16 32766
  1 registers to spill after remat.

What strikes me here is that it never mentions the instruction

  5056B %vreg1954:lo16<def> = mv_nimm6_ar16 0

where we do have a read of vreg1954:hi16 since there is no read-undef on the def operand.

Is this how it's intended to be?

(The use points are found by

     for (MachineRegisterInfo::use_bundle_nodbg_iterator
          RI = MRI.use_bundle_nodbg_begin(Reg), E = MRI.use_bundle_nodbg_end();
          RI != E; ) {
       anyRemat |= reMaterializeFor(LI, MI);
     }

and

  MachineRegisterInfo::defusechain_instr_iterator::advance

seems to skip all def operands for use_bundle_nodbg_iterator since ReturnDefs is false. So it will happily advance past the instruction setting lo16.)

Then after the remats it does

  spillAroundUses %vreg1954

and there I get

  reload: 5052r %vreg1957<def> = Load40FI <fi#2>
  rewrite: 5056r %vreg1957:lo16<def> = mv_nimm6_ar16 0

since no remat was inserted before 5056. But noone has stored anything at fi#2 so I end up with

  *** Bad machine code: Instruction loads from dead spill slot ***

Does everything above, except the error at the end, look ok to you?

I tried fiddling with MachineRegisterInfo::defusechain_instr_iterator to make it also return the

  %vreg1954:lo16<def> = mv_nimm6_ar16 0

instruction, and then my program actually compiled succesfully!

My diff:

@@ -886,11 +886,11 @@ public:
      explicit defusechain_instr_iterator(MachineOperand *op) : Op(op) {
        // If the first node isn't one we're interested in, advance to one that
        // we are interested in.
        if (op) {
          if ((!ReturnUses && op->isUse()) ||
- (!ReturnDefs && op->isDef()) ||
+ (!ReturnDefs && op->isDef() && !op->readsReg()) ||
              (SkipDebug && op->isDebug()))
            advance();
        }
      }
      friend class MachineRegisterInfo;
@@ -907,11 +907,11 @@ public:
            else
              assert(!Op->isDebug() && "Can't have debug defs");
          }
        } else {
          // If this is an operand we don't care about, skip it.
- while (Op && ((!ReturnDefs && Op->isDef()) ||
+ while (Op && ((!ReturnDefs && Op->isDef() && !Op->readsReg()) ||
                        (SkipDebug && Op->isDebug())))
            Op = getNextOperandForReg(Op);
        }
      }
    public:

But of course then other tests fail. For example:

  build-all/./bin/llc < test/CodeGen/R600/literals.ll -march=r600 -mcpu=redwood

gives

llc: ../lib/CodeGen/TwoAddressInstructionPass.cpp:684: void (anonymous namespace)::TwoAddressInstructionPass::scanUses(unsigned int): Assertion `SrcRegMap[NewReg] == Reg && "Can't map to two src registers!"' failed.

So I suppose there are assumptions that defusechain_instr_iterator ignores implicit sub register use when defining some sub register. :confused:

What's your thoughts on this?

Thanks,
Mikael

Hi Mikael,

Hi Quentin,

[...]

It seems that you will have to debug further the *** Bad machine code: Instruction loads from dead spill slot *** before we can be of any help.

Yes, I've done some more digging. Sorry for the long mail...

I get:

Inline spilling aN40_0_7:%vreg1954 [5000r,5056r:0)[5056r,5348r:1) 0@5000r 1@5056r

At this point I have the following live ranges for vreg1954:

%vreg1954 [5000r,5056r:0)[5056r,5348r:1) 0@5000r 1@5056r

And vreg1954 is mentioned in these instructions:

5000B %vreg1954:hi16<def,read-undef> = mv_any16 32766
[...]
5048B mv_ar16_r16_rmod1 %vreg1954:hi16, %vreg1753
5056B %vreg1954:lo16<def> = mv_nimm6_ar16 0
5064B Store40FI %vreg1954, <fi#2>
[...]
5128B %vreg223<def> = COPY %vreg1954
[...]
5216B %vreg1178<def> = COPY %vreg1954
[...]
5348B %vreg1955<def> = COPY %vreg1954

Then it tries to rematerialize:

Value %vreg1954:0@5000r may remat from %vreg1954:hi16<def,read-undef> = mv_any16 32766

And it examines all use points of vreg1954 to see if it can remat or not:

  cannot remat for 5128e %vreg223<def> = COPY %vreg1954
  cannot remat for 5216e %vreg1178<def> = COPY %vreg1954
  remat: 5044r %vreg1956:hi16<def,read-undef> = mv_any16 32766
          5048e mv_ar16_r16_rmod1 %vreg1956:hi16<kill>, %vreg1753

  cannot remat for 5064e Store40FI %vreg1954, <fi#2>
  cannot remat for 5348e %vreg1955<def> = COPY %vreg1954
All defs dead: %vreg1954:hi16<def,read-undef,dead> = mv_any16 32766
Remat created 1 dead defs.
Deleting dead def 5000r %vreg1954:hi16<def,read-undef,dead> = mv_any16 32766
1 registers to spill after remat.

What strikes me here is that it never mentions the instruction

5056B %vreg1954:lo16<def> = mv_nimm6_ar16 0

where we do have a read of vreg1954:hi16 since there is no read-undef on the def operand.

Is this how it's intended to be?

What does not seem intended to me :).

(The use points are found by

   for (MachineRegisterInfo::use_bundle_nodbg_iterator
        RI = MRI.use_bundle_nodbg_begin(Reg), E = MRI.use_bundle_nodbg_end();
        RI != E; ) {
     anyRemat |= reMaterializeFor(LI, MI);
   }

and

MachineRegisterInfo::defusechain_instr_iterator::advance

seems to skip all def operands for use_bundle_nodbg_iterator since ReturnDefs is false. So it will happily advance past the instruction setting lo16.)

Then after the remats it does

spillAroundUses %vreg1954

and there I get

  reload: 5052r %vreg1957<def> = Load40FI <fi#2>
  rewrite: 5056r %vreg1957:lo16<def> = mv_nimm6_ar16 0

since no remat was inserted before 5056. But noone has stored anything at fi#2 so I end up with

*** Bad machine code: Instruction loads from dead spill slot ***

Does everything above, except the error at the end, look ok to you?

I tried fiddling with MachineRegisterInfo::defusechain_instr_iterator to make it also return the

%vreg1954:lo16<def> = mv_nimm6_ar16 0

instruction, and then my program actually compiled succesfully!

My diff:

@@ -886,11 +886,11 @@ public:
    explicit defusechain_instr_iterator(MachineOperand *op) : Op(op) {
      // If the first node isn't one we're interested in, advance to one that
      // we are interested in.
      if (op) {
        if ((!ReturnUses && op->isUse()) ||
- (!ReturnDefs && op->isDef()) ||
+ (!ReturnDefs && op->isDef() && !op->readsReg()) ||

This fix is not correct because you will return a definition operand in place of our missing “implicit use”, which, I believe, will lead to funny crashes.

            (SkipDebug && op->isDebug()))
          advance();
      }
    }
    friend class MachineRegisterInfo;
@@ -907,11 +907,11 @@ public:
          else
            assert(!Op->isDebug() && "Can't have debug defs");
        }
      } else {
        // If this is an operand we don't care about, skip it.
- while (Op && ((!ReturnDefs && Op->isDef()) ||
+ while (Op && ((!ReturnDefs && Op->isDef() && !Op->readsReg()) ||
                      (SkipDebug && Op->isDebug())))
          Op = getNextOperandForReg(Op);
      }
    }
  public:

But of course then other tests fail. For example:

build-all/./bin/llc < test/CodeGen/R600/literals.ll -march=r600 -mcpu=redwood

gives

llc: ../lib/CodeGen/TwoAddressInstructionPass.cpp:684: void (anonymous namespace)::TwoAddressInstructionPass::scanUses(unsigned int): Assertion `SrcRegMap[NewReg] == Reg && "Can't map to two src registers!"' failed.

So I suppose there are assumptions that defusechain_instr_iterator ignores implicit sub register use when defining some sub register. :confused:

What's your thoughts on this?

Conceptually, we could add an implicit use of the high part on the definition of the low part.
That said, this will not solve the problem, because rematerializing wouldn’t change the vreg of the low part, and the range would become disjoint.
Assuming we solve that problem, I am not sure this will be understood correctly by the verifier. The verifier might expect that the related register is fully defined before an implicit-use. This may also over-constrained the scheduler… I do not know that part.

Alternative we could do the same thing as insertReload:
Walk all uses *and* defs and apply the rematerialization for each use or readsRegs.
Notice the different iterator for remat and for reload. They should be the same.

I let you give it a try.

Thanks,
-Quentin

Hi Quentin,

> [...]

Alternative we could do the same thing as insertReload:
Walk all uses *and* defs and apply the rematerialization for each use or readsRegs.
Notice the different iterator for remat and for reload. They should be the same.

Excellent! Ok so now I did

@@ -932,15 +932,28 @@ void InlineSpiller::reMaterializeAll() {
    // Try to remat before all uses of snippets.
    bool anyRemat = false;
    for (unsigned i = 0, e = RegsToSpill.size(); i != e; ++i) {
      unsigned Reg = RegsToSpill[i];
      LiveInterval &LI = LIS.getInterval(Reg);
- for (MachineRegisterInfo::use_bundle_nodbg_iterator
- RI = MRI.use_bundle_nodbg_begin(Reg), E = MRI.use_bundle_nodbg_end();
- RI != E; ) {
- MachineInstr *MI = &*(RI++);
- anyRemat |= reMaterializeFor(LI, MI);

Hi Mikael,

Hi Quentin,

> [...]

Alternative we could do the same thing as insertReload:
Walk all uses *and* defs and apply the rematerialization for each use or readsRegs.
Notice the different iterator for remat and for reload. They should be the same.

Excellent! Ok so now I did

@@ -932,15 +932,28 @@ void InlineSpiller::reMaterializeAll() {
  // Try to remat before all uses of snippets.
  bool anyRemat = false;
  for (unsigned i = 0, e = RegsToSpill.size(); i != e; ++i) {
    unsigned Reg = RegsToSpill[i];
    LiveInterval &LI = LIS.getInterval(Reg);
- for (MachineRegisterInfo::use_bundle_nodbg_iterator
- RI = MRI.use_bundle_nodbg_begin(Reg), E = MRI.use_bundle_nodbg_end();
- RI != E; ) {
- MachineInstr *MI = &*(RI++);
- anyRemat |= reMaterializeFor(LI, MI);
+
+ for (MachineRegisterInfo::reg_bundle_iterator
+ RegI = MRI.reg_bundle_begin(Reg), E = MRI.reg_bundle_end();
+ RegI != E; ) {
+ MachineInstr *MI = &*(RegI++);
+
+ // Debug values are not allowed to affect codegen.
+ if (MI->isDebugValue()) {
+ continue;
+ }
+
+ // Analyze instruction.
+ SmallVector<std::pair<MachineInstr*, unsigned>, 8> Ops;
+ MIBundleOperands::VirtRegInfo RI =
+ MIBundleOperands(MI).analyzeVirtReg(Reg, &Ops);
+
+ if (RI.Reads)
+ anyRemat |= reMaterializeFor(LI, MI);
    }
  }
  if (!anyRemat)
    return;

So with this change InlineSpiller::reMaterializeAll is more similar to InlineSpiller::spillAroundUses, and now I'm not aware of any crashes anymore. :slight_smile: My test case works, and I haven't seen any other tests suddenly failing.

Excellent, thanks for checking!

Is this a good change that should be done not only locally in my version?

Yes, this is a good change!
Please submit the actual patch to llvm-commit.

Thanks,
-Quentin