Finding label of basic block where a conditional branch merges

Hi,

is there any standard way to figure out the label of the basic block where a conditional branch merges?

If we have a branch statement without an else part this is straightforward:

br i1 %cmp, label %if.then, label %if.end, !dbg !24

We only need to check the operand names whether they contain "end". However things are getting
more complex when there is an else branch and in particular when we have nested branches.

Take for example the following C code:

if (x == NULL || y == NULL) {
do {
// ...
} while (0);
}

This yields the following IR:

br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31

lor.lhs.false: ; preds = %entry
%1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32
%cmp1 = icmp eq %struct.s1* %1, null, !dbg !33
br i1 %cmp1, label %if.then, label %if.end, !dbg !34

if.then: ; preds = %lor.lhs.false, %entry
br label %do.body, !dbg !35, !llvm.loop !37

do.body: ; preds = %if.then
br label %do.end, !dbg !39

do.end: ; preds = %do.body
br label %if.end, !dbg !41

if.end: ; preds = %do.end, %lor.lhs.false
call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata !24), !dbg !43

The question now is how to obtain "if.end" given br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31.

My current algorithm basically works for a lot of cases but fails here. What I am doing is to push the branch instruction to a stack and then always use the left most label of the next terminator instruction to move forward. Whenever I encounter a basic block that contains an "end" in its label I pop from the stack, whenever there is a conditional branch I push. If the stack is empty I have reached the correct basic block. For the example given it would however yield "do.end" which is not correct.

I am wondering whether there is any known graph algorithm I could use to solve the problem or even better something that is already implemented within LLVM?

--Sebastian

Sebastian Roland via llvm-dev <llvm-dev@lists.llvm.org> writes:

is there any standard way to figure out the label of the basic block
where a conditional branch merges?

What do you need to use this block for? The immediate post-dominator
will be the block where control essentially re-converges. It is the
"nearest" block guaranteed to execute if the block containing the branch
executes.

See PostDominatorTreeAnalysis.

Depending on where you are in the pass pipeline, the CFG maty bear
little resemblence to high-level control flow.

If we have a branch statement without an else part this is straightforward:

br i1 %cmp, label %if.then, label %if.end, !dbg !24

We only need to check the operand names whether they contain
"end".

This seems extremely brittle. Post-dominators is the compiler-theory-y
way to determine this.

However things are getting more complex when there is an else branch
and in particular when we have nested branches.

Take for example the following C code:

if (x == NULL || y == NULL) {
do {
// ...
} while (0);
}

This yields the following IR:

br i1 %cmp, label %if.then, label %lor.lhs.false, !dbg !31

lor.lhs.false: ; preds = %entry
%1 = load %struct.s1*, %struct.s1** %s1.addr, align 8, !dbg !32
%cmp1 = icmp eq %struct.s1* %1, null, !dbg !33
br i1 %cmp1, label %if.then, label %if.end, !dbg !34

if.then: ; preds =
%lor.lhs.false, %entry
br label %do.body, !dbg !35, !llvm.loop !37

do.body: ; preds = %if.then
br label %do.end, !dbg !39

do.end: ; preds = %do.body
br label %if.end, !dbg !41

if.end: ; preds = %do.end,
%lor.lhs.false
call void @llvm.dbg.declare(metadata i32* %a, metadata !42, metadata
!24), !dbg !43

The question now is how to obtain "if.end" given br i1 %cmp, label
%if.then, label %lor.lhs.false, !dbg !31.

if.end post-dominates the block containing the branch in question and it
is also the immediate post-dominator (it does not post-dominate any
other block that post-dominates the block containing the branch).

I am wondering whether there is any known graph algorithm I could use
to solve the problem or even better something that is already
implemented within LLVM?

Post-dominators. :slight_smile:

                                 -David

The answer to this question depends a lot as to what you mean by "where a conditional branch merges."

The immediate post-dominator of a basic block is the first point where all possible paths from that basic block must execute. When your basic block in question is the immediate dominator (or indeed, just any dominator), then that means that you can say that you are the head of an if statement and the immediate post-dominator is the tail of the same if statement.

However, when gotos, or goto-like statements (break, continue, return, throw, etc.) are involved, then the tail of an if statement is no longer a postdominator. Note that such statements can be introduced by optimizations such as jump threading.

if (...) {
  return true;
} else {
  ...
}
// This node is no longer a postdominator because of the return.

The question here is what you want to do. If your goal is akin to decompilation, where you want to "ignore" these branches for the purposes of making simple graphs, then the approach is going to be a heuristic-filled approach for which there are no obvious answers. If your goal is instead mere correctness--you want to undo something you did before the if statement--then you're going to have to adapt your algorithm to account for the possibility of these kinds of CFGs.

Also note that LLVM does not guarantee the existence of value names--clang, in release builds, does not generate any value names whatsoever, for blocks or values. Any approach relying on clang generating these names is not going to work in such scenarios.

Joshua, David

much appreciate your quick help!

What I am actually doing is statically tracing values. If one of the traced values is part of a condition (e.g. in an if statement) all instructions in the then and else part are also traced. The automatic tracing of instructions however needs to stop when I hit the first instruction after the if statement (same for switch).

Dominators seem to be a good starting point!

Sebastian

Sebastian Roland via llvm-dev <llvm-dev@lists.llvm.org> writes:

Joshua, David

much appreciate your quick help!

What I am actually doing is statically tracing values. If one of the
traced values is part of a condition (e.g. in an if statement) all
instructions in the then and else part are also traced. The automatic
tracing of instructions however needs to stop when I hit the first
instruction after the if statement (same for switch).

Dominators seem to be a good starting point!

It sounds like you're looking for a control dependence analysis (which
Instructions/BasicBlocks are dependent on the outcome of some branch and
thus on the value input into the branch). That can be computed from the
"post-dominance frontier" of the graph
(https://en.wikipedia.org/wiki/Data_dependency#Control_Dependency). I
don't think LLVM has a post-dominance frontier built-in, but it has a
regular ol' DominanceFrontier. It should be possible to adapt that to
do what you want.

Good luck!

                                -David

If I'm understanding your ask correctly, you want every value whose execution or non-execution was controlled by a condition. This isn't quite standard control-dependence, but you would want a variant of an algorithm to compute the set of basic blocks that are control-dependent on a value. (Contrary to what David Greene suggests, postdominance frontier is not the best way to compute that property--that is the set of values a basic block is control-dependent on, which is the inverse of the desired relation). Standard control-dependence is defined as a basic block postdominating a successor but not the basic block itself. You want to also consider basic blocks that don't necessarily postdominate a successor--essentially, you're looking for nodes which are on a path from the source basic block to the immediate postdominator.

Honestly, the best algorithm to use here would be to do a standard DFS traversal starting from the node with the condition, breaking off the traversal if it reaches the immediate postdominator.