We design IRs so that they are easy for the compiler to reason about, being easy for a human to write is a secondary concern. FWIW, this is why we have IRs in the first place: it’s hard for an optimizer to reason about C.
On a less meta point, in MLIR, we can have control flow either structured with regions or as a CFG. The naive way of implementing “break” is doing “goto label-after-the-loop”. This is totally fine if the IR is already a CFG, but it makes life significantly harder if it is mixed with regions. Now one can no longer consider a region in isolation but rather needs to keep track of all goto targets. It is totally possible using common CFG approaches, but at this point we will be just adding unnecessary complexity of dealing with regions to those approaches. I see very little value in mixing the two (although I can be convinced otherwise if there are transformations we can do on mixed form that we wouldn’t be able to do otherwise).
Furthermore, MLIR needs to reason about opaque, undefined operations with regions and control flow so we need at least some general rules on how the control is transferred that we encoded in regions.
You can express that perfectly well in numerous ways. Specific choices depend on what you need to do with it. Chris was mentioning an AST-like approach where you can literally have
ast.while (%condition1) {
ast.if (%condition2) {
ast.if (%condition3) {
ast.break
ast.implicit_terminator
}
ast.implicit_terminator
}
ast.implicit_terminator
}
as long as you don’t need MLIR to understand the control flow. In this case, the semantics of the “ast” dialect are to declare that an AST node exists, operations are executed in normal sequential order (there’s actually only one control flow path through the declaration) and as a result you get a declaration of the AST. This AST is transformable and can also be “code generated” through dialect conversions to operations that have understandable-to-MLIR control flow. Which operations specifically depends again on what you want to do with them.
For all practical purposes, one can rewrite the following (I discarded your example because loop exit conditions can be trivially folded into “while”)
while (cond1) {
f1();
if (cond2) {
f2();
if (cond3) {
f3();
break;
}
f4();
}
f5();
}
into
bool mustBreak = false;
while (cond1 && !mustBreak) {
f1();
if (cond2) {
f2();
if (cond3) {
f3();
mustBreak = true;
}
if (!mustBreak) f4();
}
if (!mustBreak) f5();
}
by predicating every statement potentially executed after at least one break
on the absence of break. This can be, for example, a rewrite on the hypothetical AST dialect. It can be then directly represented with SCF through memory:
%true = constant true
%false = constant false
%c0 = constant 0 : index
%mustBreak = alloca : memref<1xi1>
store %false, %mustBreak[%c0] : memref<1xi1>
scf.while {
%cond1 = ...
%break = load %mustBreak[c0] : memref<1xi1>
%stop = and %cond1, %false : i1
scf.condition %stop
} do {
call @f1()
%cond2 = ...
%outer = scf.if (%cond2) {
call @f2()
%cond3 = ...
scf.if (%cond3) {
call @f3()
store %true, %mustBreak[%c0]
}
%break = load %mustBreak[%c0] : memref<1xi1>
scf.if (%break) {
} else {
call @f4()
}
}
%break = load %mustBreak[%c0] : memref<1xi1>
scf.if (%break) {
} else {
call @f5()
}
}
which can be further rewritten to use loop-carried values with some variant of mem2reg:
%true = constant true
%false = constant false
scf.while (%mustBreak = %false : i1) {
%cond1 = ...
%stop = and %cond1, %false : i1
scf.condition %stop, %mustBreak : i1
} do {
^bb0(%arg0: i1):
call @f1()
%cond2 = ...
%outer = scf.if (%cond2) -> i1 {
call @f2():
%cond3 = ...
%inner = scf.if (%cond3) -> i1 {
call @f3()
scf.yield %true : i1
} else {
scf.yield %false: i1
}
scf.if (%inner) {
} else {
call @f4()
}
scf.yield %inner : i1
} else {
scf.yield %false : i1
}
scf.if (%outer) {
} else {
call @f5()
}
scf.yield %outer
}
This doesn’t break our SCF analyses and transformations, that rely on there being single block and static control flow, but gives you the desired expressivity on another level. You can also go to CFG early by defining appropriate operations, or extend my previous sketch by making break_if
also “breakable” in a way that it communicate the break request up to the first loop.