undefs in phis

Ok, I understand a bit more of what's going on in my phi example.

Coming into DAGtoDAG we have this code:

bb74:
  x = phi(1.0:bb134, %r1450:bb108)
  y = phi(undef:bb134, x:bb108)
[...]

bb108:
  %r1450 = <expr>

After DAGtoDAG we have:

bb134:
  %reg1459 = IMPLICIT_DEF
  %reg1458 = 1.0

bb74:
  %reg1176 = phi(%reg1458:bb134, %reg1253:bb108)
  %reg1177 = phi(%reg1459:bb134, %reg1176:bb108)
[...]

bb108:
  %reg1253 = <expr>

So far so good, though the IMPLICIT_DEF is worrisome. I'm guessing that's
what causes problems later.

After phi elimination we have:

bb134:
  %reg1645 = 1.0

bb74:
  %reg1176 = MOVAPS %reg1645
  %reg1177 = MOVAPS %reg1646
[...]

bb108:
  %reg1645 = <expr>
  %reg1646 = %reg1176

Again, this is all good except that %reg1646 is not defined on the first entry
to bb74. This is actually consistent with the original code which is:

x = 1.0
do
  if (x == 1.0) {
    [...use x...]
  else {
    [...use x and y...]
  }
  y = x
  x = <expr>
end do

So there is a use-before-def in the dataflow sense but it's safe due to the
control flow.

My guess is that regalloc or some other pass after phi elimination gets
confused due to the use-before-def of %reg1646.

My question is, who is responsible for fixing up a situation like this? My
guess is that llvm-gcc takes care of patching holes like this and our
frontend could do the same. Should llvm be able to handle situations like
this or is the result undefined?

                                  -Dave

After phi elimination we have:

bb134:
%reg1645 = 1.0

bb74:
%reg1176 = MOVAPS %reg1645
%reg1177 = MOVAPS %reg1646
[...]

bb108:
%reg1645 = <expr>
%reg1646 = %reg1176

I find it a little strange that the IMPLICIT_DEF disappears. Besides
that, it looks okay up to here.

Should llvm be able to handle situations like
this or is the result undefined?

LLVM should be able to handle the IL in question, I think. Using
undef in the way the given IL is using it is looks legitimate.

-Eli

> After phi elimination we have:
>
> bb134:
> %reg1645 = 1.0
>
> bb74:
> %reg1176 = MOVAPS %reg1645
> %reg1177 = MOVAPS %reg1646
> [...]
>
> bb108:
> %reg1645 = <expr>
> %reg1646 = %reg1176

I find it a little strange that the IMPLICIT_DEF disappears. Besides
that, it looks okay up to here.

I just verified that it does disappear.

> Should llvm be able to handle situations like
> this or is the result undefined?

LLVM should be able to handle the IL in question, I think. Using
undef in the way the given IL is using it is looks legitimate.

Ok, that's good to know. I'll continue tracking this one down.

                                       -Dave

After phi elimination we have:

bb134:
%reg1645 = 1.0

bb74:
%reg1176 = MOVAPS %reg1645
%reg1177 = MOVAPS %reg1646
[...]

bb108:
%reg1645 = <expr>
%reg1646 = %reg1176

I find it a little strange that the IMPLICIT_DEF disappears. Besides
that, it looks okay up to here.

I just verified that it does disappear.

It's intentional. We don't want a live interval defined by an implicit_def. It unnecessarily increases register pressure.

Evan

Ah, I see. But coalescing seems to break. I don't know if it's because
of the eliminated IMPLICIT_DEF, though. Here's what happens. Let's look at
this code, annotated with live range indices:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1176<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0
2704 %reg1177<def> = FsMOVAPSrr %reg1646<kill> ; srcLine 0 *** u before d
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ; srcLine 0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
4360 %reg1646<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

Coalescing comes along and says this:

2704 %reg1177<def> = FsMOVAPSrr %reg1646<kill> ; srcLine 0
    Inspecting %reg1646,0 = [2700,2706:0)[4362,4372:0) 0@4362-(2706) and
%reg1177,0 = [2706,3712:0)[3768,3878:0) 0@2706-(3878):
    Joined. Result = %reg1177,0 = [2700,3712:0)[3768,3878:0)[4362,4372:0)
0@4362-(3878)

Ok, that seems reasonable. The code now looks like this:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1176<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0
[deleted copy]
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ; srcLine 0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0 *** updated
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

This still looks correct. The coalescer then says:

4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0
    Inspecting %reg1176,0 = [2702,4362:0) 0@2702-(4362) and %reg1177,0 =
[2700,3712:0)[3768,3878:0)[4362,4372:0) 0@4362-(3878):
    Joined. Result = %reg1177,0 = [2700,4372:0) 0@2702-(4362)

Eh? How can it coalesce these two? Doesn't %reg1176:[2702,4362:0) overlap
%reg1177:[2700,3712:0)?

I will look into this further. Just wanted to check in and make sure.

                                           -Dave

Ok, the Coalescer thinks %reg1177 value number 0 is defined by a copy from
%reg1176. I guess that could be considered correct because %reg1177 is
defined by a copy from %reg1176 in bb108. It thus considers the intervals to
not interfere.

But this can't be right. I think the problem is that there should be two
value numbers for %reg1177. We already have VN 0 defined from %reg1176.
What coalescing is missing is that %reg1177 is ALSO defined by an implicit def
from bb134. That's the value number we're missing and is why coalescing
incorrectly removed the copy.

So it appears we can't get rid of the IMPLICIT DEF. Evan, do you agree?

                                            -Dave

This still looks correct. The coalescer then says:

4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0
    Inspecting %reg1176,0 = [2702,4362:0) 0@2702-(4362) and %reg1177,0 =
[2700,3712:0)[3768,3878:0)[4362,4372:0) 0@4362-(3878):
    Joined. Result = %reg1177,0 = [2700,4372:0) 0@2702-(4362)

Eh? How can it coalesce these two? Doesn't %reg1176:[2702,4362:0) overlap
%reg1177:[2700,3712:0)?

I will look into this further. Just wanted to check in and make sure.

Ok, the Coalescer thinks %reg1177 value number 0 is defined by a copy from
%reg1176. I guess that could be considered correct because %reg1177 is
defined by a copy from %reg1176 in bb108. It thus considers the intervals to
not interfere.

But this can't be right. I think the problem is that there should be two
value numbers for %reg1177. We already have VN 0 defined from %reg1176.
What coalescing is missing is that %reg1177 is ALSO defined by an implicit def
from bb134. That's the value number we're missing and is why coalescing
incorrectly removed the copy.

I don't have the whole context to understand why you think this is a bug. An implicit_def doesn't actually define any value. So we don't care if a live interval overlaps live ranges defined by an implicit_def.

That said, the way we models undef in machine instruction has always bugged me. I thought about adding a MachineOperand type to represent undef. Then we don't have to muddle the semantics of a "def". To me, that's a cleaner representation, but it will require work.

Evan

I don't have the whole context to understand why you think this is a
bug. An implicit_def doesn't actually define any value. So we don't
care if a live interval overlaps live ranges defined by an implicit_def.

It's a bug because the coalerscer does illegal coaescing.

Our last episode left us here:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1176<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0
[deleted copy]
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u
before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ;
srcLine
0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0 ***
updated
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

This still looks correct. The coalescer then says:

4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0
                Inspecting %reg1176,0 = [2702,4362:0) 0@2702-(4362) and
%reg1177,0 =
[2700,3712:0)[3768,3878:0)[4362,4372:0) 0@4362-(3878):
                Joined. Result = %reg1177,0 = [2700,4372:0) 0@2702-(4362)

Now let's look at the resulting code:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1177<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0 *** u
[deleted copy]
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u
before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ;
srcLine
0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
[deleted copy]
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

The very first instruction in bb74 is wrong. The coalescer has said that y
always has the same value of x and that's incorrect. y is always one value
"behind" x in the original source.

The coalescer thinks it can do this because %reg1177 only has one value number
and that VN is marked as a copy from %reg1176. What I'm saying is that while
%reg1177 really is copied from %reg1176, it also has some initial value
("undef") coming into the loop. That value is not captured by the live
interval information. It's because of this value that we cannot coalesce
%reg1177 and %reg1176. It's because of this value that %reg1177 is always one
value "behind" %reg1176.

Now, if there's some other way to tell the coalescer that the coalescing is
illegal, that's fine. I don't care about the undef value number itself. I
care about the coalescer behaving itself. :slight_smile: I just don't know how to tell
the coalescer not to coalesce this without disabling all coalescings of
interfering live intervals, even those that are just copies from the source
register.

Can you think of another way to fix this that's quick and easy?

That said, the way we models undef in machine instruction has always
bugged me. I thought about adding a MachineOperand type to represent
undef. Then we don't have to muddle the semantics of a "def". To me,
that's a cleaner representation, but it will require work.

Long-term I don't have an opinion on what happens here. But right now I
need to fix this.

                                                 -Dave

I don't have the whole context to understand why you think this is a
bug. An implicit_def doesn't actually define any value. So we don't
care if a live interval overlaps live ranges defined by an implicit_def.

It's a bug because the coalerscer does illegal coaescing.

Our last episode left us here:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1176<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0
[deleted copy]
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u
before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ;
srcLine
0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0 ***
updated
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

This still looks correct. The coalescer then says:

4360 %reg1177<def> = FsMOVAPSrr %reg1176<kill> ; srcLine 0
               Inspecting %reg1176,0 = [2702,4362:0) 0@2702-(4362) and
%reg1177,0 =
[2700,3712:0)[3768,3878:0)[4362,4372:0) 0@4362-(3878):
               Joined. Result = %reg1177,0 = [2700,4372:0) 0@2702-(4362)

Now let's look at the resulting code:

bb134:
2696 %reg1645<def> = FsMOVAPSrr %reg1458<kill> ; srcLine 0

bb74:
2700 %reg1177<def> = FsMOVAPSrr %reg1645<kill> ; srcLine 0 *** u
[deleted copy]
2708 %reg1178<def> = FsMOVAPSrr %reg1647<kill> ; srcLine 0 *** u
before d
2712 TEST64rr %reg1173, %reg1173, %EFLAGS<imp-def> ; srcLine 30
2716 JLE mbb<file test.f90, bb90,0x3c37ed0>, %EFLAGS<imp-use,kill> ;
srcLine
0

bb108:
[...]
4352 %reg1253<def> = MAXSSrr %reg1253, %reg1588<kill> ; srcLine 60
4356 %reg1645<def> = FsMOVAPSrr %reg1253<kill> ; srcLine 0
[deleted copy]
4364 %reg1647<def> = FsMOVAPSrr %reg1243<kill> ; srcLine 0
4368 JMP mbb<file test.f90, bb74,0x3c63a60> ; srcLine 0

The very first instruction in bb74 is wrong. The coalescer has said that y
always has the same value of x and that's incorrect. y is always one value
"behind" x in the original source.

The coalescer thinks it can do this because %reg1177 only has one value number
and that VN is marked as a copy from %reg1176. What I'm saying is that while
%reg1177 really is copied from %reg1176, it also has some initial value
("undef") coming into the loop. That value is not captured by the live
interval information. It's because of this value that we cannot coalesce
%reg1177 and %reg1176. It's because of this value that %reg1177 is always one
value "behind" %reg1176.

I am sorry I don't really follow it. Is this what you are describing?

%v1177 = undef
...
loop:
...
%v1176 = op ...
                 = %v1177
%v1177 = %v1176
jmp loop

Why is not safe to coalesce the 2 registers?

Evan

Not quite. The original code is:

%v1177 = undef
%v1645 = ...
loop:
%v1176 = %v1645
...
  = %v1176
  = %v1177
%v1645 = op ...
%v1177 = %v1176
jmp loop

We can't coalesce %v1177 and %v1176 legally. But we do.

                                              -Dave

I am sorry I don't really follow it. Is this what you are describing?

%v1177 = undef
...
loop:
...
%v1176 = op ...
                = %v1177
%v1177 = %v1176
jmp loop

Why is not safe to coalesce the 2 registers?

Not quite. The original code is:

%v1177 = undef
%v1645 = ...
loop:
%v1176 = %v1645
...
= %v1176
= %v1177
%v1645 = op ...
%v1177 = %v1176
jmp loop

We can't coalesce %v1177 and %v1176 legally. But we do.

Seriously, why not? In the first iteration, it's totally legal for v1177 has the same value as v1176. It's defined by an undef, it's allowed to have contain any value.

Evan

Think about what will happen the 2nd iteration. %v1177 will have the value of
%v1645 which is wrong. This is because %v1176 in bb74 will be replaced with
%v1177. That's incorrect.

                                              -Dave

I am sorry I don't really follow it. Is this what you are describing?

%v1177 = undef
...
loop:
...
%v1176 = op ...
               = %v1177
%v1177 = %v1176
jmp loop

Why is not safe to coalesce the 2 registers?

Not quite. The original code is:

%v1177 = undef
%v1645 = ...
loop:
%v1176 = %v1645
...
= %v1176
= %v1177
%v1645 = op ...
%v1177 = %v1176
jmp loop

We can't coalesce %v1177 and %v1176 legally. But we do.

Seriously, why not? In the first iteration, it's totally legal for
v1177 has the same value as v1176. It's defined by an undef, it's
allowed to have contain any value.

Think about what will happen the 2nd iteration. %v1177 will have the value of
%v1645 which is wrong. This is because %v1176 in bb74 will be replaced with
%v1177. That's incorrect.

Ok, right. The trick to fixing is to make sure the valno of the def of v1177 hasPHIKill to true and make sure the coalescer checks it.

Evan

I am sorry I don't really follow it. Is this what you are
describing?

%v1177 = undef
...
loop:
...
%v1176 = op ...
              = %v1177
%v1177 = %v1176
jmp loop

Why is not safe to coalesce the 2 registers?

Not quite. The original code is:

%v1177 = undef
%v1645 = ...
loop:
%v1176 = %v1645
...
= %v1176
= %v1177
%v1645 = op ...
%v1177 = %v1176
jmp loop

We can't coalesce %v1177 and %v1176 legally. But we do.

Seriously, why not? In the first iteration, it's totally legal for
v1177 has the same value as v1176. It's defined by an undef, it's
allowed to have contain any value.

Think about what will happen the 2nd iteration. %v1177 will have
the value of
%v1645 which is wrong. This is because %v1176 in bb74 will be
replaced with
%v1177. That's incorrect.

Ok, right. The trick to fixing is to make sure the valno of the def of
v1177 hasPHIKill to true and make sure the coalescer checks it.

Actually liveintervals can construct a v1177 live range starting from the beginning mbb with a val# of unknown def.

Evan

>> Think about what will happen the 2nd iteration. %v1177 will have
>> the value of
>> %v1645 which is wrong. This is because %v1176 in bb74 will be
>> replaced with
>> %v1177. That's incorrect.
>
> Ok, right. The trick to fixing is to make sure the valno of the def of
> v1177 hasPHIKill to true and make sure the coalescer checks it.

What does hasPHIKill mean, what are the consequences of using it and how do I
know when I can set it? I assume this would have been set by someone. Or is
that part of the bug?

Actually liveintervals can construct a v1177 live range starting from
the beginning mbb with a val# of unknown def.

What's "the beginning mbb?" The start of the bb containing the phi (bb74 in
this case)? Again, how would I instruct liveintervals to do this? The
IMPLICIT_DEF is already gone by that point.

                                                   -Dave

Think about what will happen the 2nd iteration. %v1177 will have

the value of

%v1645 which is wrong. This is because %v1176 in bb74 will be

replaced with

%v1177. That’s incorrect.

Ok, right. The trick to fixing is to make sure the valno of the def of

v1177 hasPHIKill to true and make sure the coalescer checks it.

What does hasPHIKill mean, what are the consequences of using it and how do I
know when I can set it? I assume this would have been set by someone. Or is
that part of the bug?

hasPHIKill just means it has a phi use so it’s not possible to determine where the value is killed. Look for LiveIntervalAnalysis.cpp.

Actually liveintervals can construct a v1177 live range starting from

the beginning mbb with a val# of unknown def.

What’s “the beginning mbb?” The start of the bb containing the phi (bb74 in
this case)? Again, how would I instruct liveintervals to do this? The
IMPLICIT_DEF is already gone by that point.

%v1645 = …
loop:
%v1176 = %v1645

= %v1176
= %v1177
%v1645 = op …
%v1177 = %v1176
jmp loop

When the live interval for v1177 is created, the code should recognize v1177 is live-in to the MBB. So even though it’s not clear where it’s defined, it’s still clear it’s live-in to the MBB.

Evan

hasPHIKill just means it has a phi use so it's not possible to
determine where the value is killed. Look for LiveIntervalAnalysis.cpp.

Ok.

%v1645 = ...
loop:
%v1176 = %v1645
...
= %v1176
= %v1177
%v1645 = op ...
%v1177 = %v1176
jmp loop

When the live interval for v1177 is created, the code should recognize
v1177 is live-in to the MBB. So even though it's not clear where it's
defined, it's still clear it's live-in to the MBB.

Yeah, ok, that should work. Thanks!

                                         -Dave