Generic Binary Condition Algorithm (Control Flow)

Hi, I am failing to implement an algorithm for a generic branching algorithm (control flow) since over 2 years now. Even ChatGPT can not solve this problem. This seems to be an unsolveable task I guess. So maybe one can give support here.

Let’s assume you have an CSharpSyntaxTree AST (managed C++). The condition for an “IfStatementSyntax” or “ForStatementSyntax“ is represented here by nested “BinaryExpressionSyntax” and “ParenthesizedExpressionSyntax“ objects. Really easy so far. But now you want to translate this into an Clang AST. So you need something like an recursive algorithm to resolve this AST into branches/phi nodes.

So one could think. Perfect, let’s use a CSharpSyntayWalker < llvm::Value* > to walk this AST recursively building the clang AST. So you call the walker’s Visit function which will call the e.g. VisitIfStatement or VisitForStatement functions. These will call for their “Condition“ property the VisitBinaryExpression function what will resolve recursively the whole condition (Left/Right properties). Whohoo.

But like I said this seems to be a highly complex task that for a normal human is not possible to solve. Let’s consider the following condition only:

(((i < 5) && (j < 5)) || ((k < 5) && (l < 5)))

The IR Code for this is something like:

define dso_local noundef i32 @main() #2 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i32, align 4
%6 = alloca i32, align 4
%7 = alloca i32, align 4
%8 = alloca i32, align 4
%9 = alloca i32, align 4
store i32 0, ptr %1, align 4
store i32 -5, ptr %2, align 4
store i32 -5, ptr %3, align 4
store i32 -5, ptr %4, align 4
store i32 -5, ptr %5, align 4
store i32 -5, ptr %6, align 4
store i32 -5, ptr %7, align 4
store i32 -5, ptr %8, align 4
store i32 -5, ptr %9, align 4
br label %10

10: ; preds = %35, %0
%11 = load i32, ptr %2, align 4
%12 = icmp slt i32 %11, 5
br i1 %12, label %13, label %16

13: ; preds = %10
%14 = load i32, ptr %3, align 4
%15 = icmp slt i32 %14, 5
br i1 %15, label %24, label %16

16: ; preds = %13, %10
%17 = load i32, ptr %4, align 4
%18 = icmp slt i32 %17, 5
br i1 %18, label %19, label %22

19: ; preds = %16
%20 = load i32, ptr %5, align 4
%21 = icmp slt i32 %20, 5
br label %22

22: ; preds = %19, %16
%23 = phi i1 [ false, %16 ], [ %21, %19 ]
br label %24

24: ; preds = %22, %13
%25 = phi i1 [ true, %13 ], [ %23, %22 ]
br i1 %25, label %26, label %52

26: ; preds = %24

ret i32 0

}

I understand this IR Code, but like I said I can not figure out an generic algorithm that can reproduce this IR Code that also works for other conditions. What I understand is, that you need PHINodes only I you want or need a value at the end. So an if most likely needs no PHINodes. A for loop needs it for the counter. And it seems to need one PHINode per condition level.

E.g. the condition

((i < 5) || ((j < 5) || ((k < 5) || (l < 5))))

has three PHINodes. One per nested parenthesis.

If you double it

(i < 5) || ((j < 5) || ((k < 5) || (l < 5)))) || ((m < 5) || ((n < 5) || ((o < 5) || (p < 5))))

now it has 4 PHINodes. Very confusing.

Maybe one will lead me to a working algorithm or some pseudocode I could get an idea from?

the technique u r looking for is called branch-oriented (or destination-driven) code generation. The main thing is don’t return i1 values recursively, pass target blocks down

void EmitBranchOnBoolExpr(Expr *E, BasicBlock *TrueBB, BasicBlock *FalseBB) {
  if (auto *BO = dyn_cast<BinaryOperator>(E)) {
    if (BO->isLogicalAnd()) {
      BasicBlock *RHSBB = CreateBB("and.rhs");
      EmitBranchOnBoolExpr(BO->getLHS(), RHSBB, FalseBB);
      Builder.SetInsertPoint(RHSBB);
      EmitBranchOnBoolExpr(BO->getRHS(), TrueBB, FalseBB);
      return;
    }
    if (BO->isLogicalOr()) {
      BasicBlock *RHSBB = CreateBB("or.rhs");
      EmitBranchOnBoolExpr(BO->getLHS(), TrueBB, RHSBB);
      Builder.SetInsertPoint(RHSBB);
      EmitBranchOnBoolExpr(BO->getRHS(), TrueBB, FalseBB);
      return;
    }
  }
  Value *Cond = EmitScalarBool(E);
  Builder.CreateCondBr(Cond, TrueBB, FalseBB);
}

For if/while/for conditions, pass the then/else (or body/exit) blocks directly, no PHIs needed.

PHIs only appear when you need the boolean as a value (stored, returned, passed). In that case, wrap the above: create true/false/merge blocks, branch via EmitBranchOnBoolExpr, then add a PHI at merge with true/false incoming from each path.

ur “1 PHI per nesting level” observation is just a side effect of materializing intermediate i1 values, with the branch-form approach, you avoid that entirely

actually Clang does exactly this, check CodeGenFunction::EmitBranchOnBoolExpr for a good reference reference :wink:

Thanks for your response. Actually ChatGPT recommended a quite similar Emit function also. But the result always was structured different from what clang had produced. Most likely also correct, but I wanted to be consistent with the results of clang. Otherwise it would become difficult later to compare if my logics are correct. I will try again then.