Copy Instructions?

There is no register-to-register copy instructions in LLVM. I've found a bug
in mem2reg (LLVM 2.3 but it doesn't appear to be fixed in trunk either) and a
copy instruction seems to be the easiest way to fix it.

Consider this code coming into the RenamePass phase of mem2reg.
"*" in the phi means we haven't added that incoming value yet.

          // x.0 = ...
          // y.0 = ...
          //
          // do
          // x = phi(x.0, *)
          // y = phi(y.0, *)
          // [...]
          // store x -> y
          // store expr -> x
          // end do
          //
          // We are updating both phis to add a value from a back edge
          // When we saw the first store we set IncomingVals[y] to x.
          // When we see the second store we need to create a
          // temporary t = x before the second store and set
          // IncomingVals[y] to t. Otherwise y's phi will pick up the
          // new value of x (expr) rather than the old, proper one.

This bug triggers on a code we have here. I've been unable to successfully
bugpoint it as bugpoint croaks with a bad_alloc exception. We're in the midst
of upgrading to 2.4 so I will re-try bugpoint then. The above pseudocode
represents the situation adequately.

How do I go about creating the copy t = x? I don't want to use an add with a
zero constant because it may not be optimized in all circustations (floating
point, for example).

In this particular example another solution would be to reorder the phis but I
don't think that will work in the general case.

                                                    -Dave

Actually, that wouldn't work at all. Never mind. :slight_smile:

                                     -Dave

How do I go about creating the copy t = x? I don't want to use an add with a
zero constant because it may not be optimized in all circustations (floating
point, for example).

You can use a no-op bitcast for scalars, but there isn't any reliable
way to do it for all first-class values. That said, I don't quite
follow the issue. This is SSA, so the only way a value can change is
if you change the code.

I'm not really following what the issue is in this testcase, though,
so I could be missing something. The way you're describing it,
mem2reg appears to be working as intended; the correct completion is
in fact as follows:
         // x = phi(x.0, expr)
         // y = phi(y.0, x)

In this particular example another solution would be to reorder the phis but I
don't think that will work in the general case.

PHI nodes don't have an ordering, at least conceptually; they're not
real instructions.

-Eli

I dont know if this helps but this sort of an issue apparently crops up during the un-SSA phase, if copy propagation is done on SSA’ed code. The exact problem is mentioned in (the Lost Copy problem) “Practical Improvements to the Construction and Destruction of Static Single Assignment Form”. Since there is no explicit assignment (copy) instruction in LLVM, I dont know if this is a problem here.

regards,
Prakash

You can use a no-op bitcast for scalars, but there isn't any reliable
way to do it for all first-class values.

Guh.

That said, I don't quite follow the issue. This is SSA, so the only way a
value can change is if you change the code.

This isn't (yet) SSA. This is mem2reg turning things into SSA.

I'm not really following what the issue is in this testcase, though,
so I could be missing something. The way you're describing it,
mem2reg appears to be working as intended; the correct completion is
in fact as follows:
         // x = phi(x.0, expr)
         // y = phi(y.0, x)

Yes, that's what mem2reg ends up with.

> In this particular example another solution would be to reorder the phis
> but I don't think that will work in the general case.

PHI nodes don't have an ordering, at least conceptually; they're not
real instructions.

They don't have an ordering? So what does your above code mean? I would have
expected it to mean that y is either the value of y.0 or the value from the
phi immediate preceeding it. Is that not correct?

If my understanding is correct then the code is wrong. The original code
looks something like this:

x = 0
y = 0
do
  [...use x and y...]
  y = x
  x = expr
end do

So y should get the previous value of x, not the value updated at the end of
the loop body.

If there truly is no ordering then perhaps Prakash is correct and this is an
out-of-SSA problem. I'd appreciate your opinion on this. I read the Briggs
SPE paper a long time ago. Time to dust it off.

                                                  -Dave

Eli is correct in that, in theory, PHI nodes in SSA form are unordered, i.e. all PHIs in a given basic block are evaluated simultaneously. In that case, the correct code would be:

x.0 = 0
y.0 = 0
do
  x = phi(x.0, x.1)
  y = phi(y.0, x)
  x.1 = ...
  ...
end do

LLVM does not implement this simultaneous evaluation for practical reasons. Most situations can be resolved by reordering the PHI nodes, though the swap copy still remains:

x.0 = 0
y.0 = 0
do
  x = phi(x.0, y)
  y = phi(y.0, x) // NOT CORRECT WITH ORDERED PHIS!
  ...
end do

In this case, you can resolve it by inserting an extra PHI node:

x.0 = 0
y.0 = 0
do
  foo = phi(undef, x)
  x = phi(x.0, y)
  y = phi(y.0, foo) // CORRECT WITH ORDERED PHIS!
  ...
end do

--Owen

Why is this a problem? All phis execute "atomically". What problem are you seeing in practice?

-Chris

I'm pretty sure we don't actually implement atomic execution of phis. I recall having the same basic issue a long time ago, and being informed of this.

--Owen

>> You can use a no-op bitcast for scalars, but there isn't any reliable
>> way to do it for all first-class values.
>
> Guh.
>
>> That said, I don't quite follow the issue. This is SSA, so the
>> only way a
>> value can change is if you change the code.
>
> This isn't (yet) SSA. This is mem2reg turning things into SSA.
>
>> I'm not really following what the issue is in this testcase, though,
>> so I could be missing something. The way you're describing it,
>> mem2reg appears to be working as intended; the correct completion is
>> in fact as follows:
>> // x = phi(x.0, expr)
>> // y = phi(y.0, x)
>
> Yes, that's what mem2reg ends up with.

Why is this a problem? All phis execute "atomically". What problem
are you seeing in practice?

I'm getting incorrect answers.

How can a set of phis with a dependence execute simultaenously?
There is only one definition of x and it has to happen before the use of
x in the phi defining y, doesn't it?

Owen says:

In this case, you can resolve it by inserting an extra PHI node:

x.0 = 0
y.0 = 0
do
       foo = phi(undef, x)
       x = phi(x.0, y)
       y = phi(y.0, foo) // CORRECT WITH ORDERED PHIS!
       ...
end do

Again, how can simultaenous execution of statements with a dependence have any
meaning? For practical reasons, as Owen says, LLVM does not treat phis as
executing simultaneously. So in this case wouldn't there be a use-before-def
error on x? I should think Verifier would choke on this code.

                                                 -Dave

How can a set of phis with a dependence execute simultaenously?
There is only one definition of x and it has to happen before the use of
x in the phi defining y, doesn't it?

The "normal" answer would be that they execute atomically only at the IR level. Part of the out-of-SSA translation that happens in the backend is to insert sufficient copies to to hide the non-atomic nature of the execution.

In this case, you can resolve it by inserting an extra PHI node:

x.0 = 0
y.0 = 0
do
      foo = phi(undef, x)
      x = phi(x.0, y)
      y = phi(y.0, foo) // CORRECT WITH ORDERED PHIS!
      ...
end do

Again, how can simultaenous execution of statements with a dependence have any
meaning? For practical reasons, as Owen says, LLVM does not treat phis as
executing simultaneously. So in this case wouldn't there be a use-before-def
error on x? I should think Verifier would choke on this code.

I have no idea if the Verifier actually accepts this, but it is semantically valid because of the conditional execution involved in the phi. The value of x will never be read before it has been def'd.

--Owen

What? Of course we do. This is why phi elimination currently lowers each phi into 1+numpreds copies.

-Chris

Why is this a problem? All phis execute "atomically". What problem
are you seeing in practice?

I'm getting incorrect answers.

How can a set of phis with a dependence execute simultaenously?
There is only one definition of x and it has to happen before the use of
x in the phi defining y, doesn't it?

PHIs can't depend on each other.

Again, how can simultaenous execution of statements with a dependence have any
meaning?

There is no such thing.

For practical reasons, as Owen says, LLVM does not treat phis as
executing simultaneously.

He's wrong.

-Chris

So if this is true then it seems that you wouldn't need the extra phi
anyway, since y = phi(y.0, x) would be semantically valid.

Ok, so I can understand how the SSA construction can be considered
valid. All right, I'll have to go and see if out-of-SSA does the right thing
with this.

                                            -Dave

Are we absolutely sure about this? There are a lot of algorithms that walk
basic blocks.

                                           -Dave

So just to be clear, this means that all phi uses are considered to execute
before any phi derfs?

                                           -Dave

yes, the semantics are that all phis in a block read their inputs, after all the "reads" are done they all "store" to their results.

-Chris