PHI nodes in machine code

Could anybody quickly explain why PHI nodes instructions are necessary in
machine code? And why the code in LiveVariables.cpp which looks at those PHI
nodes (line 249 and below) is necessary.

The reason I'm asking is that I try to support 64-bit comparison and I do it
by generating code like:

          // if high1 cond high2: goto operand0
          // if high1 reverse_cond high2: goto operand1
          // if low cond high2: goto operand0
          // goto operand1

but this means that operand0 and operand1 (the successor basic blocks)
suddenly get more predecessor than recorded in phi nodes, and
LiveVariables.cpp asserts on that.

Of course, I can add another two basic block which will set some register to 0
or 1 and branch based on the value of that register, but maybe the above
approach can work as well.

Thanks in advance,
Volodya

PHI nodes within machine code were originally used by the Sparc back-end but they turned out not to be necessary. Instead, LLVM phis are lowered to copy instructions in the machine code (I believe this happens just after instruction selection). As far as I know, the machine PHI nodes are not used by the x86 back-end and you shouldn't need them if you insert the right copies.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Could anybody quickly explain why PHI nodes instructions are necessary
in machine code? And why the code in LiveVariables.cpp which looks at
those PHI nodes (line 249 and below) is necessary.

LLVM Machine code is in SSA.

Let's say you want to do

  r = a cond b

But doing this:

  if (a cond b) then
    r = 1
  else
    r = 0

gets you two definitions of r. So we have machine PHI nodes merge the
two possible values into one for result of r. These phis get removed
after register allocation. So you would write code like:

  if (a cond b) then
    r1 = 1
  else
    r2 = 0
  
  r = phi(r1, r2)

The reason I'm asking is that I try to support 64-bit comparison and I do it
by generating code like:

          // if high1 cond high2: goto operand0
          // if high1 reverse_cond high2: goto operand1

the second if should just be an unconditional branch to operand1:
clearly, if (high1 cond high2) is false, the reverse condition is true.

          // if low cond high2: goto operand0
          // goto operand1

These would have to go into different blocks, as above you would have an
unconditional branch, which then changes your branches significantly...

but this means that operand0 and operand1 (the successor basic blocks)
suddenly get more predecessor than recorded in phi nodes, and
LiveVariables.cpp asserts on that.

Naturally, as PHI nodes need to have as many entries as there are
predecessors.

Of course, I can add another two basic block which will set some
register to 0 or 1 and branch based on the value of that register, but
maybe the above approach can work as well.

The above approach should work.

PHI nodes within machine code were originally used by the Sparc
back-end but they turned out not to be necessary.

Actually, they are currently used in non-SparcV9 backends (see below).

Instead, LLVM phis are lowered to copy instructions in the machine
code (I believe this happens just after instruction selection).

After instruction selection, code generators do not consult LLVM code,
so they cannot lower from LLVM phis anymore (plus, there could be more
Machine Basic Blocks than LLVM basic blocks, so there would be a need
for more copies than the LLVM PHI node specifies.

For example, to implement the 'select' instruction, it is necessary to
move one of two values into a register, or similarly when we want to
evaluate "r = a cond b". Since the machine code is in SSA, we need to
use the Machine code PHI nodes to specify this to the target-independent
register allocator so that it can coallesce registers appropriately, if
possible.

The SparcV9 backend does not implement the 'select' instruction
directly, it is lowered by the LowerSelect pass before V9 instruction
selector. Furthermore, V9 has a conditional move instruction which
some other architectures lack, so they need the MachineCode PHI nodes to
work with the target-independent register allocator.

As far as I know, the machine PHI nodes are not used by the x86
back-end and you shouldn't need them if you insert the right copies.

The x86 currently uses PHI nodes because it uses the target-independent
register allocator. In contrast, the V9 backend inserts copies
manually, either via the "PreSelection" pass which it uses or in the
instruction selector itself, as I understand it.

Misha's right. In the target-independent code generator, the instruction
selector is required to translate LLVM PHI nodes into machine code PHI
nodes, and is allowed to create any additional PHI nodes that it needs.
Misha points out that setcc and select instructions are two common cases
where machine code phi's may be needed where LLVM PHI's don't exist.
Volodya also needs them for 64-bit integer compares (FYI, the X86 backend
has similar needs, but works around them by using conditional moves).

In general, the machine code CFG is more complex and may be slightly
different than the LLVM CFG. It is the goal of the code generator to make
it so that NO LLVM state needs to be consulted for code generation after
instruction selection. We're not quite there yet, but are very close.

Specifically w.r.t. machine code PHI nodes, they are created by the
instruction selector, possibly manipulated by machine code optimizations,
then eliminated by the register allocator. SSA makes it really trivial to
sparsely compute live-variable information, for example, which is used by
all of the target-independent LLVM register allocators.

-Chris

Misha Brukman wrote:

LLVM Machine code is in SSA.

This explains quite a lot. I though it's possible to just reduce convert phis
into copy instructions in predecessors -- all of which will have the same
destination register.

gets you two definitions of r. So we have machine PHI nodes merge the
two possible values into one for result of r. These phis get removed
after register allocation. So you would write code like:

  if (a cond b) then
    r1 = 1
  else
    r2 = 0

  r = phi(r1, r2)

Ok, I see.

> The reason I'm asking is that I try to support 64-bit comparison and I do
> it by generating code like:
>
> // if high1 cond high2: goto operand0
> // if high1 reverse_cond high2: goto operand1

the second if should just be an unconditional branch to operand1:
clearly, if (high1 cond high2) is false, the reverse condition is true.

Actually, you've found a bug: it should be swapped_cond, not reverse_cond.

> but this means that operand0 and operand1 (the successor basic blocks)
> suddenly get more predecessor than recorded in phi nodes, and
> LiveVariables.cpp asserts on that.

Naturally, as PHI nodes need to have as many entries as there are
predecessors.

Ehmm... this means the 'SelectPHINodes' function I've copy-pasted from X86
backend needs more work... gotta do that now.

- Volodya

There are algorithms for eliminating PHI nodes, but they aren't quite so
simple. Consider what happens when you have phi nodes in a block like
this:

   Loop:
      A = phi (B, Loop), (0, OutOfLoop)
      B = phi (A, Loop), (1, OutOfLoop)

      ...
      br Loop

No matter how you insert copies in the predecessors, you can't maintain
the semantics of the phis (which are simultaneous assignment). This is
obviously possible to handle, it's just not entirely trivial. FWIW, this
is known as the "swap problem".

More importantly, SSA is useful for the code generator. It makes live
variable analysis almost trivial for example, and is a natural
representation to use for other reasons: you have to have an infinite # of
regs before register allocation, why not make them SSA?

-Chris

Chris Lattner wrote:

> Misha Brukman wrote:
> > LLVM Machine code is in SSA.
>
> This explains quite a lot. I though it's possible to just reduce convert
> phis into copy instructions in predecessors -- all of which will have the
> same destination register.

There are algorithms for eliminating PHI nodes, but they aren't quite so
simple. Consider what happens when you have phi nodes in a block like
this:

   Loop:
      A = phi (B, Loop), (0, OutOfLoop)
      B = phi (A, Loop), (1, OutOfLoop)

      ...
      br Loop

No matter how you insert copies in the predecessors, you can't maintain
the semantics of the phis (which are simultaneous assignment). This is
obviously possible to handle, it's just not entirely trivial. FWIW, this
is known as the "swap problem".

Thanks for explaning!

More importantly, SSA is useful for the code generator. It makes live
variable analysis almost trivial for example, and is a natural
representation to use for other reasons: you have to have an infinite # of
regs before register allocation, why not make them SSA?

Yes, this is reasonable thing.

- Volodya