Inferring dependencies in phi instructions

I am trying to infer data dependencies using LLVM's def-use chains and
I am having trouble dealing with 'phi' instructions. Specifically,

If I am given the code snippet:

int foo() {
  int y = 1;
  int x;
  if (y) {
    x = 2;
  } else {
    x = 3;
  }
  return x;
}

Here, x has a data dependence on y (not control because x is assigned
in both halves), but LLVM expresses 'x' as: %x.0 = phi i32 [ 2, %2 ],
[ 3, %3 ] using phi-nodes, omitting the dependence on y.

If I write the function as:

int foo() {
  int y = 1;
  int x = y ? 2 : 3;
  return x;
},

the dependencies work out alright because LLVM now uses a select
instruction, where the dependence on y is explicit:

%y = select i1 %x, i32 2, i32 3

In general, I don't think I can convert phi nodes into select
statements (because a phi node can refer to values from more than two
paths in general, while select statements permit only two options, one
each for a true and a false branch). Any thoughts on how this is
handled in practice would be very helpful.

Anirudh

Hi Anirudh,

'x' has a control dependency on 'y' because the value assigned to 'x'
depends on a path selected. This dependency can be converted into a data
dependency by means of a 'select' instruction because the control flow is
simple.
What is the problem with using phi-functions in DFG? Yes they require more
work to find out what they depend on.

Kind regards,
Evgeny Astigeevich

Hi Anirudh,

‘x’ has a control dependency on ‘y’ because the value assigned to ‘x’
depends on a path selected. This dependency can be converted into a data
dependency by means of a ‘select’ instruction because the control flow is
simple.
What is the problem with using phi-functions in DFG? Yes they require more
work to find out what they depend on.

Thank you for your response. Going by your definition, x is control dependent on y. To extract this control dependency, do I need to maintain path conditions for each basic block or can I do something simpler?

Also, I feel like this should be a recurring problem. Could you point me to any code example that identifies all dependencies (control and data) for phi instructions?

Hi Anirudh,

'x' has a control dependency on 'y' because the value assigned to 'x'
depends on a path selected. This dependency can be converted into a data
dependency by means of a 'select' instruction because the control flow is
simple.

Just an FYI, there is an optimization called "If-Conversion" which transforms control dependencies to data dependencies; it is discussed in the Allen and Kennedy book ("Optimizing Compilers for Modern Architectures" in Chapter 6 or 7, I think).

If you need to find control dependencies, you can find them easily using LLVM's ReverseDominanceFrontier pass.

Regards,

John Criswell

Hi Anirudh,

'x' has a control dependency on 'y' because the value assigned to 'x'
depends on a path selected. This dependency can be converted into a data
dependency by means of a 'select' instruction because the control flow is
simple.

Just an FYI, there is an optimization called "If-Conversion" which
transforms control dependencies to data dependencies; it is discussed in the
Allen and Kennedy book ("Optimizing Compilers for Modern Architectures" in
Chapter 6 or 7, I think).

I tried this, but how do I express the resulting predicated
instructions in llvm's IR? For instance,

X = a + b

becomes

X = if (path_condition) (a + b)

I couldn't find an appropriate instruction type for predicated instructions.

If you need to find control dependencies, you can find them easily using
LLVM's ReverseDominanceFrontier pass.

Ok, I didn't know about this and implemented the cdg on my own.
However, regardless of how the cdg is implemented, I don't think it
captures control dependencies in phi instructions?

I believe you’re looking for the “select” IR instruction.

—escha

Hi Anirudh,

'x' has a control dependency on 'y' because the value assigned to 'x'
depends on a path selected. This dependency can be converted into a data
dependency by means of a 'select' instruction because the control flow is
simple.

Just an FYI, there is an optimization called "If-Conversion" which
transforms control dependencies to data dependencies; it is discussed in the
Allen and Kennedy book ("Optimizing Compilers for Modern Architectures" in
Chapter 6 or 7, I think).

I tried this, but how do I express the resulting predicated
instructions in llvm's IR? For instance,

X = a + b

becomes

X = if (path_condition) (a + b)

I couldn't find an appropriate instruction type for predicated instructions.

The "select" instruction you mentioned earlier is what you want.

If you need to find control dependencies, you can find them easily using
LLVM's ReverseDominanceFrontier pass.

Ok, I didn't know about this and implemented the cdg on my own.
However, regardless of how the cdg is implemented, I don't think it
captures control dependencies in phi instructions?

Control dependence is usually computed on basic block granularity (e.g., basic block B is control-dependent on a conditional branch at the end of basic block A). Given that basic block B is control-dependent on basic block A, then all the phi-nodes at the beginning of basic block B are control-dependent on the conditional branch at the end of basic block A.

If you're not familiar with the DominanceFrontier algorithm for computing control-dependence, I recommend you read it. You can find it described in the Allen/Kennedy book as well as the paper "Efficiently Computing Static Single Assignment Form and the Control-Dependence Graph."

Regards,

John Criswell

Hi Anirudh,

'x' has a control dependency on 'y' because the value assigned to 'x'
depends on a path selected. This dependency can be converted into a data
dependency by means of a 'select' instruction because the control flow is
simple.

Just an FYI, there is an optimization called "If-Conversion" which
transforms control dependencies to data dependencies; it is discussed in the
Allen and Kennedy book ("Optimizing Compilers for Modern Architectures" in
Chapter 6 or 7, I think).

I tried this, but how do I express the resulting predicated
instructions in llvm's IR? For instance,

X = a + b

becomes

X = if (path_condition) (a + b)

I couldn't find an appropriate instruction type for predicated instructions.

The "select" instruction you mentioned earlier is what you want.

Do you mean something like this (because select instruction by itself
doesn't allow assignments to arbitrary expressions)?

For instance, if I was converting:

X = a + b (where X is an 8-bit integer)
into
if (path_condition) X = a + b

I would do:

path_condition = some boolean operation in the LLVM IR
tmp = a + b
X = select i1 path_condition, i8 tmp, i8 undef

Hi Anirudh,

I hope these lecture slides about SSA and the dominance frontier will help you with SSA and control flow analysis:

http://www.seas.harvard.edu/courses/cs252/2011sp/slides/Lec04-SSA.pdf

Unfortunately a use of DominanceFrontierBase is deprecated in LLVM.

Thank you for your response. Going by your definition, x is control dependent on y.

To extract this control dependency, do I need to maintain path conditions for each basic block or can I do something simpler?

Also, I feel like this should be a recurring problem. Could you point me to any code example that identifies all dependencies (control and data) for phi instructions?

You won’t have any of this problems if you build dominance frontiers in the reverse CFG (ReverseDominanceFrontiers).

Then for a phi-function you get its basic block A and for the basic block A you get RDF(A). Your phi-function depends on conditions in basic blocks from RDF(A).

Kind regards,

Evgeny Astigeevich

You actually don't even need reverse dominance frontiers, you can do
it with a post-dominator tree.

Thanks for all these responses; they clarify quite a bit. I am still
confused though. I don't see how a ReverseDominanceFrontier (or any
other method to compute the CDG) helps me solve my problem. Here's a
minimal working example:

define i32 @foo(i32 %a, i32 %b) #0 {
  %1 = icmp sgt i32 %a, %b
  br i1 %1, label %2, label %3

; <label>:2 ; preds = %0
  br label %4

; <label>:3 ; preds = %0
  br label %4

; <label>:4 ; preds = %3, %2
  %r.0 = phi i32 [ %a, %2 ], [ %b, %3 ]
  ret i32 %r.0
}

Here the basic block with label 4 isn't control dependent on basic
blocks 0, 2, or 3 by the definition of control dependence (from
Appel's book: "We say that a node y is control-dependent on x if from
x we can branch to u or v; from u there is a path to exit that avoids
y, and from v every path to exit hits y").

Because control _always_ flows from any of basic blocks 0, 2, or 3 to
4, 4 isn't control dependent on either of 0, 2, or 3. This in turn,
means that the phi node within basic block 4 isn't control dependent
on basic blocks 0, 2, or 3.

In this particular case, I know that using the simplifycfg pass solves
my problem by converting phi nodes into selects. I would like to know
how to handle the general case.

Thanks in advance for any advice you may have.
Anirudh

Hi Anirudh,

I agree with Daniel. You can use the post-dominator tree.
In LLVM you can get it as follows:

getAnalysis<PostDominatorTree>()

Of course you must specify it in dependencies of your pass:

  void getAnalysisUsage(AnalysisUsage &AU) const override {
    ... // some code
    AU.addRequired<PostDominatorTree>();
   ... // some other code
  }

INITIALIZE_PASS_DEPENDENCY(PostDominatorTree)

To find all control dependencies the following simple algorithm can be used:

1. Consider all edges <X, Y> such that Y does not post-dominate X (X has more than one successor).
2. Traverse the post-dominator tree bottom-up:

B = Y
while (B != parent of X in PDT)
      B is control dependent on X
      B = parent of B in PDT

You can use the following algorithm to find all conditions a phi-function depends on:
1. For each incoming basic block, get all blocks it is control dependent on. Add the incoming block to the set if the parent of the phi-function is control dependent on it.
2. For each found block, get its terminator and find a condition the terminator uses.

Let's consider your example:

CFG:

          ---- Entry----
         > >
         v v
         2 3
         > >
          ---> 4 <----

PDT:

            ----4-----
          > > >
          v v v
          2 3 Entry

Edges of interest in CFG: <Entry, 2>, <Entry, 3> (<2, 4> and <3, 4> are not because 4 post-dominates 2 and 3.)
The parent of Entry in PDT is 4.
We have:
  - 2 is control dependent on Entry
  - 3 is control dependent on Entry
  - 4 is not control dependent on any basic block

The phi-function in 4 has incoming blocks: 2 and 3.
2 and 3 are control dependent on Entry. The terminator of entry uses %1. So the phi-function depends on %1. No other conditions are added as 2 and 3 have unconditional jumps.

Hope this gives you more understanding of the topic.

Kind regards,
Evgeny Astigeevich

Hi Anirudh,

I agree with Daniel. You can use the post-dominator tree.
In LLVM you can get it as follows:

getAnalysis<PostDominatorTree>()

Of course you must specify it in dependencies of your pass:

  void getAnalysisUsage(AnalysisUsage &AU) const override {
    ... // some code
    AU.addRequired<PostDominatorTree>();
   ... // some other code
  }

INITIALIZE_PASS_DEPENDENCY(PostDominatorTree)

To find all control dependencies the following simple algorithm can be used:

1. Consider all edges <X, Y> such that Y does not post-dominate X (X has more than one successor).
2. Traverse the post-dominator tree bottom-up:

B = Y
while (B != parent of X in PDT)
      B is control dependent on X
      B = parent of B in PDT

You can use the following algorithm to find all conditions a phi-function depends on:
1. For each incoming basic block, get all blocks it is control dependent on. Add the incoming block to the set if the parent of the phi-function is control dependent on it.
2. For each found block, get its terminator and find a condition the terminator uses.

Let's consider your example:

CFG:

          ---- Entry----
         > >
         v v
         2 3
         > >
          ---> 4 <----

PDT:

            ----4-----
          > > >
          v v v
          2 3 Entry

Edges of interest in CFG: <Entry, 2>, <Entry, 3> (<2, 4> and <3, 4> are not because 4 post-dominates 2 and 3.)
The parent of Entry in PDT is 4.
We have:
  - 2 is control dependent on Entry
  - 3 is control dependent on Entry
  - 4 is not control dependent on any basic block

The phi-function in 4 has incoming blocks: 2 and 3.
2 and 3 are control dependent on Entry. The terminator of entry uses %1. So the phi-function depends on %1. No other conditions are added as 2 and 3 have unconditional jumps.

Hope this gives you more understanding of the topic.

Thank you for spelling this out in so much detail. This makes much
more sense to me now. I 'll try using this approach when constructing
the control dependence graph.

Anirudh