[MachineCopyPropagation] Issue with register forwarding/allocation/verifier in out-of-tree target

Hi all,

Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). The verification in question is:

*** Bad machine code: Multiple connected components in live interval ***
- function: utils_la_suite_matmul_ref
- interval: %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0@192r 1@208B-phi 2@312r 3@380r
0: valnos 0 1 3
1: valnos 2

In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify():

       // Normal value defined by an instruction. Check for two-addr redef.
       // FIXME: This could be coincidental. Should we really check for a tied
       // operand constraint?
       // Note that VNI->def may be a use slot for an early clobber def.
       if (const VNInfo *UVNI = LR.getVNInfoBefore(VNI->def))
         EqClass.join(VNI->id, UVNI->id);

Just after the greedy allocator runs, the instruction at 380B also defines %vreg77, so the verification check treats it as a two-addr redefinition (even though it is not) and allows it. MachineCopyForwarding renames the use of %vreg77 at 380B so the segment in question no longer ends at an instruction that is also a def, so the verification check fires.

It appears to me that this check is too loose, and if so that means there is something going wrong either in the allocator itself or in its interaction with this particular target in this case.

foo.log.gz (33.9 KB)

That dump seems to be well before greedy runs, isn't it?

At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is?

- Matthias

Hi all,

Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). The verification in question is:

*** Bad machine code: Multiple connected components in live interval ***
- function: utils_la_suite_matmul_ref
- interval: %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0@192r 1@208B-phi 2@312r 3@380r
0: valnos 0 1 3
1: valnos 2

In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify():

That dump seems to be well before greedy runs, isn't it?

I'm not sure what you mean. The attached log contains -print-before-all -print-after-all and -debug output starting with the coalescer pass. The verification failure is right after the first pass of MachineCopyPropagation which runs after the greedy allocator.

At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is?

Sorry, I should have pointed this out before: because the loop over instructions in MachineCopyPropagation is only visiting the BUNDLE instructions themselves (i.e. it does not visit the instructions inside the BUNDLE) and we don't forward to implicit uses (which all of the BUNDLE operands are marked as), we won't currently forward a use to a bundled instruction. I believe handling bundles more aggressively can be added as a follow-on enhancement unless we think not doing has an inherent problem.

Hi all,

Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). The verification in question is:

*** Bad machine code: Multiple connected components in live interval ***

  • function: utils_la_suite_matmul_ref
  • interval: %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) 0@192r 1@208B-phi 2@312r 3@380r
    0: valnos 0 1 3
    1: valnos 2

In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify():

That dump seems to be well before greedy runs, isn’t it?

I’m not sure what you mean. The attached log contains -print-before-all -print-after-all and -debug output starting with the coalescer pass. The verification failure is right after the first pass of MachineCopyPropagation which runs after the greedy allocator.

The copy propagation seemed to be working on vregs. This was extra confusing as D30751 seems to be currently reverted from trunk so I couldn’t find references to that code.

At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is?

Sorry, I should have pointed this out before: because the loop over instructions in MachineCopyPropagation is only visiting the BUNDLE instructions themselves (i.e. it does not visit the instructions inside the BUNDLE) and we don’t forward to implicit uses (which all of the BUNDLE operands are marked as), we won’t currently forward a use to a bundled instruction. I believe handling bundles more aggressively can be added as a follow-on enhancement unless we think not doing has an inherent problem.

I would expect you know the code in D30751 and can take a look into why only 1 of the instructions is rewritten?
From all I’ve seen so far the verification code seems to behave as expected.

  • Matthias

Hi all,

Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). Â The verification in question is:

*** Bad machine code: Multiple connected components in live interval ***
- function: Â Â Â utils_la_suite_matmul_ref
- interval: Â Â Â %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) Â 0@192r 1@208B-phi 2@312r 3@380r
0: valnos 0 1 3
1: valnos 2

In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. Â If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. Â The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify():

That dump seems to be well before greedy runs, isn't it?

I'm not sure what you mean. Â The attached log contains -print-before-all -print-after-all and -debug output starting with the coalescer pass. The verification failure is right after the first pass of MachineCopyPropagation which runs after the greedy allocator.

The copy propagation seemed to be working on vregs. This was extra confusing as D30751 seems to be currently reverted from trunk so I couldn't find references to that code.

Sorry, I should have mentioned that as well. This verification error is the last problem keeping me from re-enabling the copy forwarding patch (I can send you my latest rebased version, but I don't think it is relevant to this problem. See below).

At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is?

Sorry, I should have pointed this out before: because the loop over instructions in MachineCopyPropagation is only visiting the BUNDLE instructions themselves (i.e. it does not visit the instructions inside the BUNDLE) and we don't forward to implicit uses (which all of the BUNDLE operands are marked as), we won't currently forward a use to a bundled instruction. Â I believe handling bundles more aggressively can be added as a follow-on enhancement unless we think not doing has an inherent problem.

I would expect you know the code in D30751 and can take a look into why only 1 of the instructions is rewritten?
From all I've seen so far the verification code seems to behave as expected.

I don't think the fact that BUNDLEd instructions aren't re-written has anything to do with the verification problem. Let me try to simplify what I think is going on. Just after greedy regalloc, we end up with some code like this:

...
%vreg1<def> = ...
...
... = %vreg1
...
%vreg1<def> = %vreg1
...

verifyLiveInterval() accepts this code as valid since it sees the second def as part of the same live interval component because ConnectedVNInfoEqClasses::Classify() sees this second def as a "two-addr" redefinition, even though the def and source operands are not tied.

MachineCopyProp (pre-rewrite) runs next and turns this code into:
...
%vreg1<def> = ...
...
... = %vreg1
...
%vreg1<def> = *%vreg2*
...

verifyLiveInterval() now rejects this code since it sees these two def live ranges as being separate components. My claim is that these two code snippets are equivalent as far as the number of live range components is concerned. Therefore verifyLiveInterval() should have rejected the code just after regalloc greedy (as the FIXME in ConnectedVNInfoEqClasses::Classify hints at), which means the source of this particular problem is in regalloc greedy or before (and not in MachineCopyProp).

Ah I see. And I would agree with your interpretation.

  • Only tied use/def operands or a subregister def without the undef flag should result in connected liveranges.
  • Be careful and measure the compile time impact when switching the implementation in ConnectedVNInfoEqClasses; unfortunately there is currently no way to detect this situation just by looking at the VNInfo.
  • I guess the RAGreedy result is indeed wrong then. If I’m reading this correctly, it basically looks like this (when simplified to the problem at hand):

BB2:
vreg76 = COPY vreg77

vreg77 = COPY vreg76
vreg77 = someop vreg77
CondJmp BB2

the vreg77=COPY should have used a different vreg. My guess would be that the liverange splitting code makes similar assumptions as the ConnectedVNInfoEqClasses.

  • Matthias

Hi all,

Mikael reported a machine verification failure in his out-of-tree target with the MachineCopyPropagation changes to forward registers (which is currently reverted). Â The verification in question is:

*** Bad machine code: Multiple connected components in live interval ***
- function: Â Â Â utils_la_suite_matmul_ref
- interval: Â Â Â %vreg77 [192r,208B:0)[208B,260r:1)[312r,364r:2)[380r,464B:3) Â 0@192r 1@208B-phi 2@312r 3@380r
0: valnos 0 1 3
1: valnos 2

In this particular case, I believe that it is the greedy allocator that is creating the multiple components in the %vreg77 live interval. Â If you look at the attached debug dump file, just after the greedy allocator runs, the segment of %vreg77 from the def at 312B to the use at 380B seems to be separable from the other segments. Â The reason the above verification failure is not hit at that point seems to be related to the FIXME in the following snippet from ConnectedVNInfoEqClasses::Classify():

That dump seems to be well before greedy runs, isn't it?

I'm not sure what you mean. Â The attached log contains -print-before-all -print-after-all and -debug output starting with the coalescer pass. The verification failure is right after the first pass of MachineCopyPropagation which runs after the greedy allocator.

The copy propagation seemed to be working on vregs. This was extra confusing as D30751 seems to be currently reverted from trunk so I couldn't find references to that code.

Sorry, I should have mentioned that as well. Â This verification error is the last problem keeping me from re-enabling the copy forwarding patch (I can send you my latest rebased version, but I don't think it is relevant to this problem. Â See below).

At a first glance the odd thing there is that the operand of fladd_a32_a32_a32 is rewritten from vreg77 to vreg76, but the vreg77 operand of the BUNDLE is not. Maybe you can find out why that is?

Sorry, I should have pointed this out before: because the loop over instructions in MachineCopyPropagation is only visiting the BUNDLE instructions themselves (i.e. it does not visit the instructions inside the BUNDLE) and we don't forward to implicit uses (which all of the BUNDLE operands are marked as), we won't currently forward a use to a bundled instruction. Â I believe handling bundles more aggressively can be added as a follow-on enhancement unless we think not doing has an inherent problem.

I would expect you know the code in D30751 and can take a look into why only 1 of the instructions is rewritten?
From all I've seen so far the verification code seems to behave as expected.

I don't think the fact that BUNDLEd instructions aren't re-written has anything to do with the verification problem. Â Let me try to simplify what I think is going on. Â Just after greedy regalloc, we end up with some code like this:

...
%vreg1<def> = ...
...
... = %vreg1
...
%vreg1<def> = %vreg1
...

verifyLiveInterval() accepts this code as valid since it sees the second def as part of the same live interval component because ConnectedVNInfoEqClasses::Classify() sees this second def as a "two-addr" redefinition, even though the def and source operands are not tied.

MachineCopyProp (pre-rewrite) runs next and turns this code into:
...
%vreg1<def> = ...
...
... = %vreg1
...
%vreg1<def> = *%vreg2*
...

verifyLiveInterval() now rejects this code since it sees these two def live ranges as being separate components. Â My claim is that these two code snippets are equivalent as far as the number of live range components is concerned. Â Therefore verifyLiveInterval() should have rejected the code just after regalloc greedy (as the FIXME in ConnectedVNInfoEqClasses::Classify hints at), which means the source of this particular problem is in regalloc greedy or before (and not in MachineCopyProp).

Ah I see. And I would agree with your interpretation.

- Only tied use/def operands or a subregister def without the undef flag should result in connected liveranges.
- Be careful and measure the compile time impact when switching the implementation in ConnectedVNInfoEqClasses; unfortunately there is currently no way to detect this situation just by looking at the VNInfo.
- I guess the RAGreedy result is indeed wrong then. If I'm reading this correctly, it basically looks like this (when simplified to the problem at hand):

BB2:
   vreg76 = COPY vreg77<kill>

   vreg77 = COPY vreg76
   vreg77 = someop vreg77<kill>
   CondJmp BB2

the `vreg77=COPY` should have used a different vreg. My guess would be that the liverange splitting code makes similar assumptions as the ConnectedVNInfoEqClasses.

I think I'm going to try to work around this issue for now (with a big FIXME comment) by not copy forwarding in these cases so I can get my original patch re-enabled. Then we can look into fixing the above issue, though I don't think I'll be able to look into it for some time.