Ok with mismatch between dead-markings in BUNDLE and bundled instructions?

Hi Quentin and llvm-dev,

I've got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we've been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

     BUNDLE %vreg39<imp-def,dead>
       * %vreg39:hiAcc<def> = mv_ar16_ar16_lo16In32 %vreg37
       [...]

%vreg39 isn't used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

When the register allocator later tries to spill/reload %vreg39 it thinks that %vreg39 is live, so it inserts a reload after the bundle, but then the verifier pukes on it and says that there is a use of the register after it's dead.

The dead-marking on the BUNDLE is introduced by LiveIntervals::computeDeadValues which does

       // This is a dead def. Make sure the instruction knows.
       MachineInstr *MI = getInstructionFromIndex(Def);
       assert(MI && "No instruction defining live value");
       MI->addRegisterDead(LI.reg, TRI);

and the reload after the BUNDLE is introduced by InlineSpiller::spillAroundUses which has determined that the register is live:

     // Rewrite instruction operands.
     bool hasLiveDef = false;
     for (const auto &OpPair : Ops) {
       MachineOperand &MO = OpPair.first->getOperand(OpPair.second);
       MO.setReg(NewVReg);
       if (MO.isUse()) {
         if (!OpPair.first->isRegTiedToDefOperand(OpPair.second))
           MO.setIsKill();
       } else {
         if (!MO.isDead())
           hasLiveDef = true;
       }

So I suppose we could either make LiveIntervals::computeDeadValues mark the individual defs dead as well, or we could change InlineSpiller::spillAroundUses so that if we only look at the BUNDLE instruction (if it exists) in case of bundled instructions when looking for defs.

Any opinions on what way to go?

Thanks,
Mikael

Hi Mikael,

Thanks for bringing that up.

Hi Quentin and llvm-dev,

I've got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we've been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

   BUNDLE %vreg39<imp-def,dead>
     * %vreg39:hiAcc<def> = mv_ar16_ar16_lo16In32 %vreg37
     [...]

%vreg39 isn't used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

Ultimately, I would like this to be the way we represent these situations, unfortunately the code base is not ready for that yet.

When the register allocator later tries to spill/reload %vreg39 it thinks that %vreg39 is live, so it inserts a reload after the bundle, but then the verifier pukes on it and says that there is a use of the register after it's dead.

The dead-marking on the BUNDLE is introduced by LiveIntervals::computeDeadValues which does

     // This is a dead def. Make sure the instruction knows.
     MachineInstr *MI = getInstructionFromIndex(Def);
     assert(MI && "No instruction defining live value");
     MI->addRegisterDead(LI.reg, TRI);

and the reload after the BUNDLE is introduced by InlineSpiller::spillAroundUses which has determined that the register is live:

   // Rewrite instruction operands.
   bool hasLiveDef = false;
   for (const auto &OpPair : Ops) {
     MachineOperand &MO = OpPair.first->getOperand(OpPair.second);
     MO.setReg(NewVReg);
     if (MO.isUse()) {
       if (!OpPair.first->isRegTiedToDefOperand(OpPair.second))
         MO.setIsKill();
     } else {
       if (!MO.isDead())
         hasLiveDef = true;
     }

So I suppose we could either make LiveIntervals::computeDeadValues mark the individual defs dead as well,

I didn't get that.

or we could change InlineSpiller::spillAroundUses so that if we only look at the BUNDLE instruction (if it exists) in case of bundled instructions when looking for defs.

Any opinions on what way to go?

I believe we should teach the spiller about the subregs so that it understands a particular subregs is dead.

Cheers,
Q

Hi Quentin and llvm-dev,

I've got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we've been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

   BUNDLE %vreg39<imp-def,dead>
     * %vreg39:hiAcc<def> = mv_ar16_ar16_lo16In32 %vreg37
     [...]

%vreg39 isn't used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

As to whether the bundle is ok: The register allocator should view the whole bundle as a single instruction with all operands combined. That would conceptually give us:

    %vreg39<imp-def, dead>, %vreg39:hiAcc<def> = INST...

Of course having two defs for the same register is an uncommon situation. There's some interesting arguments to be had whether this example means the whole vreg39 is dead or just the parts of vreg39 that are not covered by the hiAcc subregister. I think in todays code it means that the whole vreg39 register is dead (though I cannot say off hand whether we do this consistently or whether it is actually documented somewhere).

When the register allocator later tries to spill/reload %vreg39 it thinks that %vreg39 is live, so it inserts a reload after the bundle, but then the verifier pukes on it and says that there is a use of the register after it's dead.

The dead-marking on the BUNDLE is introduced by LiveIntervals::computeDeadValues which does

     // This is a dead def. Make sure the instruction knows.
     MachineInstr *MI = getInstructionFromIndex(Def);
     assert(MI && "No instruction defining live value");
     MI->addRegisterDead(LI.reg, TRI);

I think the actual problem here is that MachineInstr::addRegisterDead only works on a single instruction and not on the whole bundle.

- Matthias

Hi Quentin and llvm-dev,

I’ve got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we’ve been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

BUNDLE %vreg39<imp-def,dead>

  • %vreg39:hiAcc = mv_ar16_ar16_lo16In32 %vreg37
    […]

%vreg39 isn’t used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

As to whether the bundle is ok: The register allocator should view the whole bundle as a single instruction with all operands combined. That would conceptually give us:

%vreg39<imp-def, dead>, %vreg39:hiAcc = INST…

Of course having two defs for the same register is an uncommon situation. There’s some interesting arguments to be had whether this example means the whole vreg39 is dead or just the parts of vreg39 that are not covered by the hiAcc subregister. I think in todays code it means that the whole vreg39 register is dead (though I cannot say off hand whether we do this consistently or whether it is actually documented somewhere).

Yeah I was reading this as “only the non-touched part are dead”, and that’s what I’d like to see in the representation longer. Obviously, the register is not dead as a whole here :slight_smile:

Hi Quentin and llvm-dev,

I’ve got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we’ve been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

BUNDLE %vreg39<imp-def,dead>

  • %vreg39:hiAcc = mv_ar16_ar16_lo16In32 %vreg37
    […]

%vreg39 isn’t used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

As to whether the bundle is ok: The register allocator should view the whole bundle as a single instruction with all operands combined. That would conceptually give us:

%vreg39<imp-def, dead>, %vreg39:hiAcc = INST…

Of course having two defs for the same register is an uncommon situation. There’s some interesting arguments to be had whether this example means the whole vreg39 is dead or just the parts of vreg39 that are not covered by the hiAcc subregister. I think in todays code it means that the whole vreg39 register is dead (though I cannot say off hand whether we do this consistently or whether it is actually documented somewhere).

Yeah I was reading this as “only the non-touched part are dead”, and that’s what I’d like to see in the representation longer. Obviously, the register is not dead as a whole here :slight_smile:

When the register allocator later tries to spill/reload %vreg39 it thinks that %vreg39 is live, so it inserts a reload after the bundle, but then the verifier pukes on it and says that there is a use of the register after it’s dead.

The dead-marking on the BUNDLE is introduced by LiveIntervals::computeDeadValues which does

// This is a dead def. Make sure the instruction knows.
MachineInstr *MI = getInstructionFromIndex(Def);
assert(MI && “No instruction defining live value”);
MI->addRegisterDead(LI.reg, TRI);

I think the actual problem here is that MachineInstr::addRegisterDead only works on a single instruction and not on the whole bundle.

And yes, fixing that is the most sensible short term thing to do.

I think that having two defs for the same register, one dead and one not dead simply doesn't make sense. We already assume that a register is live if at least a part of it is live, so if it's "dead", it should mean that the whole thing is dead.

-Krzysztof

Yeah I was reading this as “only the non-touched part are dead”, and that’s what I’d like to see in the representation longer. Obviously, the register is not dead as a whole here :slight_smile:

I think that having two defs for the same register, one dead and one not dead simply doesn't make sense. We already assume that a register is live if at least a part of it is live, so if it's "dead", it should mean that the whole thing is dead.

The register is not dead, but the value is. I like to keep the SSA semantic.
Here it happens that these two values are linked together because of the “name” of the virtual register, but they really don’t need to be bound.

Put simply I’d like the dead flag to report the status of the value hold by this definition, not this register. In other words, dead means you can kill the instruction if no other side effects exists.

Without subregister I would agree. However with subregisters and aliases in play you can express more situations. Like for example:

    %rax<dead>, %eax = ...

could mean the instruction writes the full rax register but we are only gonna read eax later. That said I am not sure whether we actually need it, and if llvm works that way today. Given how subtle all of this is there is also a high danger that we won't get the bahviour consistent.

- Matthias

I missed that one of them has a subregister. Nevermind.

-Krzysztof

Yeah I was reading this as “only the non-touched part are dead”, and that’s what I’d like to see in the representation longer. Obviously, the register is not dead as a whole here :slight_smile:

I think that having two defs for the same register, one dead and one not dead simply doesn't make sense. We already assume that a register is live if at least a part of it is live, so if it's "dead", it should mean that the whole thing is dead.

Without subregister I would agree. However with subregisters and aliases in play you can express more situations. Like for example:

   %rax<dead>, %eax = ...

could mean the instruction writes the full rax register but we are only gonna read eax later.

That sounds like an alias to:
%rax<def-undef, subeax> = …

Which sounds fine. Though I am not suggesting we want to move to this dead model for such situation.

That said I am not sure whether we actually need it, and if llvm works that way today. Given how subtle all of this is there is also a high danger that we won't get the bahviour consistent.

I agree that consistent behavior is important and I also think we probably cannot model what we want with the current representation. What I would like to see if that we don’t sit on potentially useful information, like this part of the register is dead, because it is convenient implementation-wise. I am not saying that’s what you're suggesting!
I agree that at the end of the day we want something that works and that is understandable. To me having the semantic of dead being this can be killed if the instruction does not have side effects sounded easy enough to understand.

What is your proposal for the semantic?

(IIRC the dead flag is required for values that are never used and the proposed fix somehow goes against that.)

Hi,

Thanks everyone for the answers!

Hi Mikael,

Thanks for bringing that up.

Hi Quentin and llvm-dev,

I've got a regalloc-related question that you might have an opinion or answer about.

In our out-of-tree target we've been doing some bundling before register allocation for quite some time now, and last night a new problem popped up. What the fix should be depends on if this bundle is legal or not:

    BUNDLE %vreg39<imp-def,dead>
      * %vreg39:hiAcc<def> = mv_ar16_ar16_lo16In32 %vreg37
      [...]

%vreg39 isn't used after the bundle so the dead-marking in the BUNDLE is correct. However, the def in the actual bundled instruction defining %vreg39 is not marked with dead.

Is the above bundle ok or not?

Ultimately, I would like this to be the way we represent these situations, unfortunately the code base is not ready for that yet.

When the register allocator later tries to spill/reload %vreg39 it thinks that %vreg39 is live, so it inserts a reload after the bundle, but then the verifier pukes on it and says that there is a use of the register after it's dead.

The dead-marking on the BUNDLE is introduced by LiveIntervals::computeDeadValues which does

      // This is a dead def. Make sure the instruction knows.
      MachineInstr *MI = getInstructionFromIndex(Def);
      assert(MI && "No instruction defining live value");
      MI->addRegisterDead(LI.reg, TRI);

and the reload after the BUNDLE is introduced by InlineSpiller::spillAroundUses which has determined that the register is live:

    // Rewrite instruction operands.
    bool hasLiveDef = false;
    for (const auto &OpPair : Ops) {
      MachineOperand &MO = OpPair.first->getOperand(OpPair.second);
      MO.setReg(NewVReg);
      if (MO.isUse()) {
        if (!OpPair.first->isRegTiedToDefOperand(OpPair.second))
          MO.setIsKill();
      } else {
        if (!MO.isDead())
          hasLiveDef = true;
      }

So I suppose we could either make LiveIntervals::computeDeadValues mark the individual defs dead as well,

I didn't get that.

E.g. extending addRegisterDead so it not only marks the def in the BUNDLE instruction as dead, but also each individual def inside the bundle.

or we could change InlineSpiller::spillAroundUses so that if we only look at the BUNDLE instruction (if it exists) in case of bundled instructions when looking for defs.

Any opinions on what way to go?

I believe we should teach the spiller about the subregs so that it understands a particular subregs is dead.

I think this issue exists also without involving subregs, but I haven't verified it.

In the bundle we have:

      BUNDLE %vreg39<imp-def,dead>
        * %vreg39:hiAcc<def> = mv_ar16_ar16_lo16In32 %vreg37
        [...]

The shown def of %vreg39:hiAcc is the only def of any %vreg39 part in the bundle, and that particular def is also dead in the sense that no one reads the value, but it's not marked as such since the call to

        MI->addRegisterDead(LI.reg, TRI);

above in LiveIntervals::computeDeadValues only changed the def in the BUNDLE instruction.

Right now I've been running tests with

+ // For bundled instructions: Only examine defs in the BUNDLE
+ // instruction itself if it exists.
+ MachineInstr *DefMI = OpPair.first;
+ if (DefMI->isBundled() &&
+ getBundleStart(DefMI->getIterator())->isBundle() &&
+ !DefMI->isBundle())
+ continue;

Hi,

> [...]

>
> [...]
> I think the actual problem here is that MachineInstr::addRegisterDead > only works on a single instruction and not on the whole bundle.

Maybe we should rather make MachineInstr::addRegisterDead mark the individual instruction defs as dead then?

I tried the below in MachineInstr::addRegisterDead rather than changing InlineSpiller::spillAroundUses, and this also solves my problem (and all tests under test/ still pass), but I suppose it's not actually as simple as this given the existance of "internal reads" in bundles? (Also I'm not sure if the return value is handled correctly.)

@@ -2186,10 +2186,23 @@ void MachineInstr::clearRegisterKills(unsigned Reg,
  }

  bool MachineInstr::addRegisterDead(unsigned Reg,
                                     const TargetRegisterInfo *RegInfo,
                                     bool AddIfNotFound) {
+ bool RetVal = false;

Not sure if I could follow everything in this discussion regarding subregisters. But I think the problem posted by Mikael just happened to involve subregisters, and the discussions about subregisters is confusing when it comes to Mikaels original question/problem.

I think that the bundle could look something like this just as well:

     BUNDLE %vreg1<def,dead>
       * %vreg1<def> = add %vreg2, %vreg3
       * call @foo, %vreg1<internal-use>

No subregisters involved.
%vreg1 is dead after the bundle.
%vreg1 is not dead when defined at the "add", because it is used later in the same bundle.

Should perhaps the %vreg1 not be included in the BUNDLE head at all here?
(but shouldn't the BUNDLE head be a summary of what is going on inside the bundle, so leaving out information about %vreg1 being defined seems wrong)

To me it seems wrong to add "dead" to the def of %vreg1 at the add (considering the internal-use).
Maybe that even answers the question that the "mismatch" between dead-markings should be OK.
Or would it be OK to mark %vreg1 as dead at the add, even though we have a later internal-use?

Regards,
Björn

Oh wait, vreg1 is indeed used.
Yeah, having a dead flag here sounds wrong.

Oh wait, vreg1 is indeed used.
Yeah, having a dead flag here sounds wrong.

I mean on the instruction itself.
On the bundle, that’s debatable. That would fit the semantic “if no side effect you can kill it” (here there is side effect, we define other vregs).

Not sure if I could follow everything in this discussion regarding subregisters. But I think the problem posted by Mikael just happened to involve subregisters, and the discussions about subregisters is confusing when it comes to Mikaels original question/problem.

I think that the bundle could look something like this just as well:

    BUNDLE %vreg1<def,dead>
      * %vreg1<def> = add %vreg2, %vreg3
      * call @foo, %vreg1<internal-use>

No subregisters involved.
%vreg1 is dead after the bundle.

True

%vreg1 is not dead when defined at the "add", because it is used later in the same bundle.

CodeGen shouldn't and doesn't care. The order inside a bundle is unspecified. There are a number of things that would make sense:
- The bundled instructions are executed in parallel and the call reads the value of vreg1 before the add happened
- The bundled instructions are executed in sequence and the call reads the definition from the add
- Both instructions are predicated and we don't even execute the call or the add at all in each run

The gist is: The generic CodeGen code doesn't know how a bundle behaves internally. As far as codegen is concerned executing the bundle is an atomic operation so it feels alot like a single instruction with just all operands merged together.

To make things confusing: A lot of the post-RA passes are not setup to traverse the bundle and the instructions inside, they only ever look at the first instruction. The way this is dealt with today is that we add the summary operands on the first instruction of a bundle (typically BUNDLE or sometimes a target specific opcode).

Should perhaps the %vreg1 not be included in the BUNDLE head at all here?

The vreg1 has to be in the header.

(but shouldn't the BUNDLE head be a summary of what is going on inside the bundle, so leaving out information about %vreg1 being defined seems wrong)

Yes

To me it seems wrong to add "dead" to the def of %vreg1 at the add (considering the internal-use).
Maybe that even answers the question that the "mismatch" between dead-markings should be OK.
Or would it be OK to mark %vreg1 as dead at the add, even though we have a later internal-use?

I think the answer here is that targets shouldn't expect dead/kill to mean something for the ordering inside the bundle. The flags are only about the effects when viewing the whole bundle as a unit. Hence we also need the internal-use flag, because normal instructions cannot read something that is defined by the same instruction. But a BUNDLE can read data that is defined as part of the same BUNDLE.

- Matthias

Hi,

Ok, so to solve the problem for our out-of-tree target we currently do:

------------------------ lib/CodeGen/InlineSpiller.cpp

Hi,

Ok, so to solve the problem for our out-of-tree target we currently do:

------------------------ lib/CodeGen/InlineSpiller.cpp ------------------------
index 67c7e506add..28245a49477 100644
@@ -987,38 +987,46 @@ void InlineSpiller::spillAroundUses(unsigned Reg) {
    if (foldMemoryOperand(Ops))
      continue;

    // Create a new virtual register for spill/fill.
    // FIXME: Infer regclass from instruction alone.
    unsigned NewVReg = Edit->createFrom(Reg);

    if (RI.Reads)
      insertReload(NewVReg, Idx, MI);

    // Rewrite instruction operands.
    bool hasLiveDef = false;
    for (const auto &OpPair : Ops) {
      MachineOperand &MO = OpPair.first->getOperand(OpPair.second);
      MO.setReg(NewVReg);
      if (MO.isUse()) {
        if (!OpPair.first->isRegTiedToDefOperand(OpPair.second))
          MO.setIsKill();
      } else {
+ // For bundled instructions: Only examine defs in the BUNDLE
+ // instruction itself if it exists.
+ MachineInstr *DefMI = OpPair.first;
+ if (DefMI->isBundled() &&
+ getBundleStart(DefMI->getIterator())->isBundle() &&
+ !DefMI->isBundle())
+ continue;
+
        if (!MO.isDead())
          hasLiveDef = true;
      }
    }
    DEBUG(dbgs() << "\trewrite: " << Idx << '\t' << *MI << '\n');

Is there any point of me putting this patch in Phabricator? I have very little hope of managing to trigger the problem on any in-tree target.

You could use a .mir file, with NOOPs that clobber a lot of regs and fake bundles here and there.

Hi,

Ok, so to solve the problem for our out-of-tree target we currently do:

------------------------ lib/CodeGen/InlineSpiller.cpp ------------------------
index 67c7e506add..28245a49477 100644
@@ -987,38 +987,46 @@ void InlineSpiller::spillAroundUses(unsigned Reg) {
     if (foldMemoryOperand(Ops))
       continue;

     // Create a new virtual register for spill/fill.
     // FIXME: Infer regclass from instruction alone.
     unsigned NewVReg = Edit->createFrom(Reg);

     if (RI.Reads)
       insertReload(NewVReg, Idx, MI);

     // Rewrite instruction operands.
     bool hasLiveDef = false;
     for (const auto &OpPair : Ops) {
       MachineOperand &MO = OpPair.first->getOperand(OpPair.second);
       MO.setReg(NewVReg);
       if (MO.isUse()) {
         if (!OpPair.first->isRegTiedToDefOperand(OpPair.second))
           MO.setIsKill();
       } else {
+ // For bundled instructions: Only examine defs in the BUNDLE
+ // instruction itself if it exists.
+ MachineInstr *DefMI = OpPair.first;
+ if (DefMI->isBundled() &&
+ getBundleStart(DefMI->getIterator())->isBundle() &&
+ !DefMI->isBundle())
+ continue;
+
         if (!MO.isDead())
           hasLiveDef = true;
       }
     }
     DEBUG(dbgs() << "\trewrite: " << Idx << '\t' << *MI << '\n');

Is there any point of me putting this patch in Phabricator? I have very little hope of managing to trigger the problem on any in-tree target.

You could use a .mir file, with NOOPs that clobber a lot of regs and fake bundles here and there.

Yes, it was much easier than I thought it would be!

With one eye on test/CodeGen/MIR/AArch64/spill-fold.mir I managed to create a test reproducing the issue on aarch64.

Patch here:
https://reviews.llvm.org/D35055

Thanks,
Mikael