Tail merging "undef" with a defined register: wrong code

The comment in test/CodeGen/X86/branchfolding-undef.mir states that such merging is legal, however doing so can actually generate wrong code:

Consider this (valid code):

Why do you say the code is wrong?

I can see why the MachineVerifier would complain, but the code seems to do the same thing as it was doing before the transformation, isn’t it?

Yes, immediately after branch folding the code would still behave the same as the original. At the same time, any subsequent optimization may "exploit" the incorrect liveness information to do something bad. If you add -run-pass if-converter, you'll get:

# After If Converter
# Machine code for function fred: IsSSA, NoPHIs, TracksLiveness, NoVRegs

BB#0:
         %R0<def> = L2_ploadruhf_io %P0<undef>, %R0<undef>, 0, %R0<imp-use>
         PS_storerhabs 0, %R0
         PS_jmpret %R31<kill>, %PC<imp-def>

# End machine code for function fred.

*** Bad machine code: Using an undefined physical register ***
- function: fred
- basic block: BB#0 (0x411e8e8)
- instruction: %R0<def> = L2_ploadruhf_io
- operand 4: %R0<imp-use>
LLVM ERROR: Found 1 machine code errors.

-Krzysztof

Yeah, that’s want I was expecting from the verifier.
I’d say we should add an implicit_def (assuming it gets kill at code emission) in the part where it is undef.

I've looked into that and it's not going to be simple, unfortunately.

Here's the original example again:

Hi Krzysztof,

Thanks for digging into this.

I've looked into that and it's not going to be simple, unfortunately.

Here's the original example again:

---
name: fred
tracksRegLiveness: true

body: |
bb.0:
   successors: %bb.1, %bb.2
     J2_jumpt undef %p0, %bb.2, implicit-def %pc
     J2_jump %bb.1, implicit-def %pc

bb.1:
   successors: %bb.3
     %r0 = L2_loadruh_io undef %r0, 0
     PS_storerhabs 0, killed %r0
     J2_jump %bb.3, implicit-def %pc

bb.2:
   successors: %bb.3
     PS_storerhabs 0, undef %r0
     J2_jump %bb.3, implicit-def %pc

bb.3:
     PS_jmpret killed %r31, implicit-def %pc
...

If the merging process produced this intermediate code (I stripped some unimportant pieces):

bb.1:
     %r0 = L2_loadruh_io undef %r0, 0
     J2_jump %bb.tail

bb.2:
     J2_jump %bb.tail

bb.tail:
   liveins: %r0
     PS_storerhabs 0, killed %r0
     J2_jump %bb.3, implicit-def %pc

Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at.

We should need to put the implicit_def only if the related register is dead (as in not live-out). Thus, for this case we wouldn’t need to do anything, right?

I am not saying it is easy though :).

Cheers,
-Quentin

Hi Krzysztof,

Thanks for digging into this.

I've looked into that and it's not going to be simple, unfortunately.

Here's the original example again:

---
name: fred
tracksRegLiveness: true

body: |
  bb.0:
    successors: %bb.1, %bb.2
      J2_jumpt undef %p0, %bb.2, implicit-def %pc
      J2_jump %bb.1, implicit-def %pc

  bb.1:
    successors: %bb.3
      %r0 = L2_loadruh_io undef %r0, 0
      PS_storerhabs 0, killed %r0
      J2_jump %bb.3, implicit-def %pc

  bb.2:
    successors: %bb.3
      PS_storerhabs 0, undef %r0
      J2_jump %bb.3, implicit-def %pc

  bb.3:
      PS_jmpret killed %r31, implicit-def %pc
...

If the merging process produced this intermediate code (I stripped some unimportant pieces):

  bb.1:
      %r0 = L2_loadruh_io undef %r0, 0
      J2_jump %bb.tail

  bb.2:
      J2_jump %bb.tail

  bb.tail:
    liveins: %r0
      PS_storerhabs 0, killed %r0
      J2_jump %bb.3, implicit-def %pc

Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at.

We should need to put the implicit_def only if the related register is dead (as in not live-out).

Yes.

Thus, for this case we wouldn’t need to do anything, right?

Why not? This is the same testcase as in the beginning of the thread. In the actual output produced by tail merging, r0 is not live out of bb.0, but is live on entry to one of its successors. Even with the hypothetical code, r0 is live on entry to bb.tail, but not live on exit from bb.2. Both cases need some fixes.

Since I've removed the old quotes for brevity, this is the current output:

body: |
   bb.0:
     successors: %bb.1, %bb.2
       J2_jumpt undef %p0, %bb.2, implicit-def %pc

   bb.1:
     successors: %bb.2
       %r0 = L2_loadruh_io undef %r0, 0

   bb.2:
     liveins: %r0
       PS_storerhabs 0, %r0
       PS_jmpret killed %r31, implicit-def %pc
...

-Krzysztof

Hi Krzysztof,
Thanks for digging into this.

I’ve looked into that and it’s not going to be simple, unfortunately.

Here’s the original example again:


name: fred
tracksRegLiveness: true

body: |
bb.0:
successors: %bb.1, %bb.2
J2_jumpt undef %p0, %bb.2, implicit-def %pc
J2_jump %bb.1, implicit-def %pc

bb.1:
successors: %bb.3
%r0 = L2_loadruh_io undef %r0, 0
PS_storerhabs 0, killed %r0
J2_jump %bb.3, implicit-def %pc

bb.2:
successors: %bb.3
PS_storerhabs 0, undef %r0
J2_jump %bb.3, implicit-def %pc

bb.3:
PS_jmpret killed %r31, implicit-def %pc

If the merging process produced this intermediate code (I stripped some unimportant pieces):

bb.1:
%r0 = L2_loadruh_io undef %r0, 0
J2_jump %bb.tail

bb.2:
J2_jump %bb.tail

bb.tail:
liveins: %r0
PS_storerhabs 0, killed %r0
J2_jump %bb.3, implicit-def %pc

Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at.

We should need to put the implicit_def only if the related register is dead (as in not live-out).

Yes.

Thus, for this case we wouldn’t need to do anything, right?

Why not?

Oh my bad, I misread the example.
I thought the first example ended up like:
BB1


BB2 (r0=)
/
v
BB3 (=r0)

Thus, we needed a definition in BB1.

And the second example was:

BB1

BB2 (r0=)

v
BB3 (=r0)

So we don’t need a definition.

This is the same testcase as in the beginning of the thread. In the actual output produced by tail merging, r0 is not live out of bb.0, but is live on entry to one of its successors. Even with the hypothetical code, r0 is live on entry to bb.tail, but not live on exit from bb.2. Both cases need some fixes.

I am having a hard time seeing the problem from the example, shouldn’t we just put an implicit_def at the end of bb.0?

Cheers,
-Quentin

That should work actually, even in the general case. I overlooked something when I first said it wouldn't.

-Krzysztof

I am having a hard time seeing the problem from the example, shouldn’t we just put an implicit_def at the end of bb.0?

That should work actually, even in the general case. I overlooked something when I first said it wouldn’t.

Great!

I ran into the LR/PC issue on ARM again, and the patch has some approach to fixing it. I'm wondering what you think.
https://reviews.llvm.org/D36160

-Krzysztof