SCF dialect vs CFG


I was reading through this topic: Dynamic control flow (break-like operation)
as I was trying to figure out how to implement general control-flow constructs in my language.
In particular, we would want to support break, continue as well as early return operations.

Based on what I read in the topic above one option would be to define an AST-like dialect which I could then lower into the SCF dialect. Did I get that right ?

My question is this: is it worth the effort to lower my AST (or even an AST-like dialect) into a CFG representation (by “CFG representation” I mean a list of inter-connected basic blocks within the enclosing function’s body region) or would it be better to rely on the SCF dialect to do that for me and find a way to transform my AST-like dialect Ops into SCF Ops instead ?
What was the motivation behind the SCF dialect ? Was it designed to be general enough to handle break/continue/return statements ?

The SCF approach seems feasible and there is also an example in the topic above which I was inspired by: Dynamic control flow (break-like operation) - #16 by ftynse
However, the explicit CFG representation seems more general and powerful especially if one introduces labeled blocks and labeled “breaks” etc. (some languages have that).

Would appreciate any additional insight on this matter.

Thank you in advance!


Yes, this is an option.

It depends on what you want to do with it. MLIR is not dogmatic one way or another: you may choose to go through the SCF dialect or not to. There are some transformations that are possible at the SCF loop level, such as fusion and tiling, given legality constraints and profitability heuristics (even higher levels of the stack such as Linalg or Affine implement the constraints). Maintaining explicit loop structure has been valuable for some high-performance compilers. There are representational requirements that come with it. Ultimately, it’s up to you to decide whether SCF, or indeed any other dialect in MLIR, has a sufficient value proposition for your project.

Originally, it was developed as a common lowering stage between Affine and Linalg. Both convert to SCF loops instead of targeting branch-based CFG as the latter is tricky to get right in the dialect conversion infrastructure.

No, it was not designed to handle language-level features such as break and continue, and has little intention of supporting these natively. It was designed so that it is easy for the compiler to reason about loops. return is a slightly different issue because it relates to functions rather than “structured control flow” such as loops, but the overall rationale is the same. As Chris mentioned in the thread you linked, SSA needs to know the CFG edges, which doesn’t seem trivial, if at all possible, with break or continue. Without this, we would basically lose most of our analyses.

Note that it is possible to model break with a language-level transformation:

do {
  if (condition2()) break;
} while (condition1());

converts to

bool c;
do {
  c = condition2();
  if (!c) two();
} while (condition1() && !c);

while loops need slightly more care if condition2() isn’t pure in functional sense, but scf.while makes it easy to handle those as it is a mix of while and do-while. We have successfully done so for break and continue coming from C in Polygeist.

In my experience, the more general some representation is, the harder it is for the compiler to reason about it and perform meaningful transformations. So I would agree that CFG is more general, but may be less powerful for the purposes of writing an optimizing compiler.

1 Like