[RFC] Region-based control-flow with early exits in MLIR

Region-based control-flow with early exits

This RFC proposes the additional of a new region-based control-flow paradigm to MLIR, but one that enables early exits via operations like break or continue in contrast with SCF.

Background

MLIR supports two control-flow “paradigms”: structured region-based control flow and control flow graphs. Control flow graphs are a general representation of control flow that is very low-level, whereas structured control flow leverages MLIR regions to provide a very high-level abstraction over control flow.

SCF is a powerful representation because the semantics of SCF (single-entry single exit-regions) mean control flow transformations that are complex on CFGs are significantly easier to implement and local, such as loop unrolling, fusion, etc. SCF has a lot of benefits as a control-flow representation, because it is by definition reducible control flow, it’s easy to write flow-sensitive analyses, etc. It is very well-suited for high-level dialects and transformations.

However, SCF lacks the ability to model early conditional exits from functions, loops, and other control-flow structures. For applications like programming language frontends, the full generality of control-flow in the host language cannot be represented with SCF. Compilers rely on CFGs or partially raise CFGs to SCF, which is a complex and costly operation, and often end up with a mixture of SCF and CFGs in their program representation.

In this RFC, I will propose a region-based control flow paradigm to MLIR that allows early exits, providing a middle ground between SCF and CFGs that provides some of the benefits of both, but allowing a pure region-based representation of control-flow.

Early region exits

The proposal is to allow defining terminator operations like break or continue that can transfer control flow to any ancestor operation. Consider

func.func @foobar() {
  cf_ext.loop {
    %0 = call @rand_bool() : () -> i1
    cf_ext.if %0 {
      cf_ext.break
    }
    call @do_something() : () -> ()
    cf_ext.continue
  }
  return
}

The execution of cf_ext.break (placeholder dialect name) operation branches to after the surrounding cf_ext.loop operation. The consequence of this is that cf_ext.if is allowed to exit of the middle of a basic block. The primary concern is this is a violation of the MLIR langref, which states that only terminator operations can leave a basic block. I propose that this representation enforce a pure region-based representation, so that the typical notion of a basic block does not apply.

This representation is extraordinarily beneficial for programming language frontends, because programs can be emitted directly into a high-level control flow representation. This representation preserves many of the benefits of SCF, such as being reducible. Data-flow algorithms can guarantee best-case iterations, and passes like mem2reg can be written in a single pass over the IR.

This is the kind of representation we have used in the Mojo compiler and it has been extraordinarily beneficial for efficient compile time and reduced complexity in the stack. I believe such a representation will be very useful for similar projects like Triton, Polygeist, CIR, etc. and by upstreaming some version of it, we can drive standardization and improvements across projects.

Another benefit to region-based control flow not mentioned above is that the scoping of memory allocations is very clear, instead of requiring start and end lifetime markers in a CFG. This representation allows preserving allocation lifetimes structurally in the IR.

Control-flow interfaces

Like RegionBranchOpInterface and BranchOpInterface, we have implemented this control-flow paradigm with a pair of interfaces for the region operations and their terminators. With these interfaces, it is possible to write generic algorithms over a variety of operations that implement them. We propose upstreaming the interfaces and the associated verifiers and provide sample implementations in the test dialect. Then, we can upstream some of the analyses and transformations written over the interfaces.

However, because this is an unusual control-flow representation in MLIR, it suffers from some inefficiencies due to the lack of first-class support. This is a similar problem for SCF: MLIR tracks the predecessors and successors of basic blocks for CFGs but provides no such benefit for regions. Due to early exits, this is particularly challenging, because given a cf_ext.loop operation, one must walk IR to find all possible predecessors. It would be nice to eventually have better first-class support for regions in this manner.

The other issue is the violation of the MLIR langref, which states: “each block represents a compiler basic block where instructions inside the block are executed in order and terminator operations implement control flow branches between basic blocks”. Perhaps we can introduce another region kind besides graph regions and SSACFG regions.

Interactions with CFGs and SCF

Operations that implement this new control-flow paradigm are allowed to coexist with CFGs and SCF. The restriction is that these operations form contiguous sub-trees in the region tree across which control flow is allowed to flow. For example, the following is invalid:

cf_ext.loop {
  scf.if %condition {
    cf_ext.if {
      cf_ext.break // error: no viable parent operation for 'break'
    }
  } 
}

This is because the scf.if breaks the two cf_ext ops into separate subtrees, across which control-flow is not allowed to flow. Consequently, cf_ext operations must have single-block regions.

What about SCF?

I would not advocate making SCF go away. SCF is still a higher-level representation that is suitable for many applications and has stronger guarantees. Many compilers will still prefer to use SESE regions to represent their conditionals and loops. Operations like affine.for and linalg.generic are only possible on SESE regions.

6 Likes

I agree that the lack of early-exit has been an unfortunate limitation of SCF (the dialect, but also the concept we allow to other dialects in general).
Thanks for the proposal!

Some early thoughts (not the most structured feedback I guess):

This is where we should start IMO.

Basically I feel I could resume the entire proposal with:

We want to allow terminators to exist in the middle of a basic block, with the restriction that these terminators can only “return” from the current region and not branch to another block in the region.

This property alone would allow to build the cf_ext dialect you’re describing I believe. This would also forces use to differentiate formally the “terminators” from the “region terminators” (which is something I feel we got away with mostly because of a lot of single-block regions…).

The question is what breaks in the MLIR control-flow model if we introduce this possibility?
The notion of post-dominator becomes a bit more difficult to infer (in particular it become infeasible with unregistered operations in the middle of a block I think?), any data-flow analysis must account for this.

At the moment the notion of control flow inside a region is a “core concept” in that you can interpret it without knowing anything about the operations. As an implementation detail, relying on an interface means that the registration of an interface (something we haven’t enforced with “promises” yet) can change the semantics of the control-flow inside a region/block. This seems a bit unsafe to me, we should aim to find a better solution.

I support this proposal (some variation at least), as I agree SCF isn’t particularly friendly to less “machine learning” front-ends. This would allow things like cf-to-scf passes to only really be needed for when the source has GOTOs or something. It could also help with loop splitting, if-conversion, masking, etc.

A few questions:

  • Do we want early exits in scf.for et al, or do we create a separate loop construct for that? Rationale: not breaking assumptions on existing ops would be a lot less work and we can convert scf.exit_for into scf.for in special cases.
  • Isn’t continue just yield? And break just return? Rationale: Our region model has a nested structure with stronger semantics than, say C++. We could piggy back the existing ops and just improve their semantics to allow for the new patterns.
  • Why do we need to allow early exit in the middle of the block? Can’t we just do like LLVM and have multiple blocks and then break/continue must be at the end of a block, but not necessarily at the end of a region? This would require making sure all BBs in a region have a terminator, but that’s not different than what we do with LLVM.

I am generally supportive of this direction as well.

To Mehdi’s point about interfaces, I don’t actually think interfaces define the implementation here. This should be just an extra type of the region similar to how we have SSACFG and Graph regions in a fixed enum. This new type would enforce single-block requirement and allow region terminators in the middle of said block, unlike other regions. The contiguous subtree property (which I’d like to be expressed in a more local manner, by the way) can then be checked by the core IR verifier. Interfaces merely provide additional information for dataflow analyses, same as the existing region/branch-based control flow interfaces: scf.for has control flow regardless of it having an RegionBranchOpInterface implementation.

What’s missing is likely a trait, which is only attachable statically as opposed to an interface, for region terminators. I’m also supportive of differentiating region and block terminators. Having worked with dataflow analyses a bit recently, the current situation is not super-well defined for the SCF/CFG mix situation where we cannot exactly know whether a terminator can branch to another block or exit the region.

As a general question, how would these new regions convert to and from SCF-like ops? I see a conversion to plain CFG, but it may be desirable to go to, e.g., Affine for Polygeist should it adopt this representation for the initial C++ mapping.

2 Likes

The only thing we have for us is “block successor”, a terminator without successor can’t be anything else than an region terminator I think?
However it’s not clear if a terminator with block successors can also be a region terminator (conditionally for example).

1 Like

Strong +1 on this proposal, I think this would be very valuable.

FWIW, I think that the existing MLIR “terminator” concept should be renamed “block terminator” independent of this proposal. It is a more specific and less confusing name.

With this rename it makes it easier to talk about this: in Mojo, the equivalent of cf_ext.break is NOT a block terminator for the very important reason that code like this is valid in C, Mojo and many other languages:

  if (cond) {
     break;
     dead_code();
  }

And (at least very early in the compiler pipeline) it is very useful to represent that dead code in MLIR. In the case of Mojo, we have an MLIR pass that diagnoses unreachable code like this, and this is also useful for bug checking and various other reasons.

-Chris

3 Likes

On a slight tangent here but, isn’t this the work of front-ends? If the code is statically dead from the AST down, then I would emit user remarks at that level and not lower to IR at all.

The fact that the source is valid doesn’t mean the code does anything, right? In your example, the whole if is dead and I would not lower any of that into IR. What am I missing?

This indeed raises the conceptual question of whether we want to represent ASTs with MLIR. But I’d argue for having early-exit constructs even if we don’t.

2 Likes

I have tried this twice 2-3 years ago with Verona and our experience wasn’t great.

The semantics of modules, functions, regions is too restrictive to do clever AST things (one of the issues was the topic of this RFC), the symbol table doesn’t have proper support for nesting and cross-module access, and anything that becomes shared (singleton) has to become a global in a module, so inaccessible from another module, and bam: duplication.

If Mojo is trying to go that way, I’d love to hear about the problems you guys are having and if you share my experiences.

But indeed this is a conceptual question: do we want to extend (and complicate) MLIR to become an AST replacement? Or do we keep it as a high-level IR to replace many of the existing LLVM optimizations at the right level?

With Clang and Flang and possibly other off-tree front-ends using MLIR as their output, trying to move MLIR up could gather wider consensus. But the risks of making the IR more complicated needs to be weighed in for the transformations that we stil want to do at the Linalg/SCF level without having to do full dominance analysis that makes LLVM IR a pain to work with.

Mojo doesn’t build an AST for statements (eg function bodies), it uses MLIR exclusively. MLIR already works great for this purpose. Mojo does build a parse tree for expressions, but doesn’t use it for type checking. We may choose to get rid of that someday as well.

While I’m not going to say that there is a “right way” to build a frontend, I’ve built several of them (Clang, Swift, now Mojo) so I have some experience here and am very happy with the Mojo design.

Coming back to Jeff’s proposal, Mojo actually has two “break” operations - in the dialect produced by the parser, this operation is not a mlir block terminator. We lower it into a break op in another dialect after checking for unreachable code, associating break labels etc. This lower level break op is a terminator. I believe Jeff is proposing the addition of the second thing actually, not the first, which was my confusion.

-Chris

I think the notions of “break” and “continue” might be too high-level; you can easily lower them to something like try/catch/throw without losing the important properties here (specifically, the region-based control flow). “break” is a “throw” caught by a “try/catch” outside the loop, “continue” is a “throw” caught by a “try/catch” inside the loop.

Assuming your want to lower a language that supports exception-handling to cf_ext, you need support for try/catch anyway. (Although I guess you might want an “exception” where the control flow is visible at compile-time to compile to something slightly different from an exception that needs to use runtime unwinding.)

1 Like

We did the same in our compiler, which also models both the C-like statement semantics (non-terminator) and a terminator with semantics related to the enclosing Region.

I’ll echo Chris’s point that Mojo doesn’t have an AST in the traditional sense, but I would also like to highlight that Mojo uses this early exit representation starting quite early in our pipeline and all the way through our “middle-end”. We lower to CFGs only at the same time as we lower to the LLVM dialect. So the discussion is much broader than just front-end/AST applications of a representation like this, but on that note, frontends often have to perform dataflow analyses over ASTs, and it is much nicer in IR, and certainly with a representation like this.

In general I agree with the sentiment here. The lack of inherent distinction between block and region terminators has been a pain point when writing dataflow analysis over generic MLIR. In general it also seems like CFG-based control-flow has much more first-class support (with successor and predecessor lists, etc.) in MLIR than region-based control flow, even though the latter is arguably more important. Improved first-class support for region-based control flow will benefit SCF as well.

This is why I am leaning towards @ftynse’s idea on introducing a new region kind, because it would really just amount to one. In Mojo, we implemented it using interfaces but it feels like a hack for the reasons you described: interactions with unregistered operations and not well defined.

I don’t think this should affect the semantics of SCF. My opinion is that SCF does it jobs quite well, but just isn’t suited to other kinds of applications. Great for ML compilers – not so great for programming languages.

This is the inherent problem: because MLIR’s region semantics are too strong, we are left with no great representation for control flow in programming languages. I mean in theory we can use a combination of scf.if and scf.while to represent all control flow, but it neither matches the source code (and hence is a poor model of its semantics) nor is a trivial transformation.

The problem is that a break operation will cause not just an exit from the current region, but from the region’s parent region:

cf_ext.loop {
  cf_ext.if {
  ^bb0:
     br ^bb1
   ^bb1:
     break // causes cf_ext.if to exit its basic block
  }
  some.op()
}

Another possibility is to make cf_ext.if a terminator with regions, giving

cf_ext.loop {
^bb0:
  cf_ext.if {
  ^bb0:
     br ^bb1
   ^bb1:
     break
  } [^bb1] // could branch to ^bb1 or exit the region
^bb1:
  some.op()
}

I have not fully explored this approach but this 1) makes every region op both a block terminator and a region terminator 2) terminators are not allowed to have regions in MLIR and 3) break transfers control flow to cf_ext.if which then contextually transfers control flow to cf_ext.loop, which suggests some kind of stateful transition but in reality, we just want break to go directly to cf_ext.loop, like it would as a CFG.

+1. I think this sounds really great.

It’s quite trivial to lower SCF to a dialect of this new representation. It is strictly a subset of the representation. Going the other way requiring raising, but it is much simpler than from CFG. It could written as a pattern from a cf_ext operation with a single break and a continue at the end of its region.

2 Likes

It sounds like there isn’t much love for the idea of introducing interfaces to model this concept, since it would not compose well with existing MLIR and IMO interfaces don’t provide the best way to model this.

I originally proposed upstreaming them as a first step, but if there is consensus that we should tackle introducing a new region kind, that would be nice as well. With a new region kind, we can enforce the invariants (contiguous subtree, single-block regions) for all MLIR in the verifier, regardless of interface registration. Then, we can add the interfaces for dataflow analyses (mostly to enable speculation).

On the separate issue of block terminator vs region terminator, I’m +1 to adding trait. This isn’t a property I would expect would need to be or should be injected into an op. It should be a parent trait to ReturnLike.

I only used break and continue as example operations that would implement this new paradigm. This RFC proposes introducing a general representation that would allow try, break, continue, or what-have-you to be implemented as ops. In other words, this RFC isn’t for the cf_ext dialect – it’s just a dummy dialect for illustration purposes. I think we can make an RFC for a cf_name_to_be_bikeshedded dialect that is the standard bearer for this control flow paradigm after the fact.

These were the points I was trying to make and apparently wasn’t successful. In my case, I did not make these changes of semantics to MLIR control flow and some AST analysis were a pain to make.

I was just nodding that your changes could probably have helped me 2-3 years ago.

Would we allow these new regions on any existing ops that have regions? Say, an scf.for with “full iteration” regions can be analysed like current ones, but if you’re using “partial iteration” regions, you have to be more careful? It could be up to individual ops to allow/disallow if it doesn’t make sense.

I’m confused. Is the proposal a new type of region, new ops, or an entirely new dialect that is also control flow, but different?

I don’t think there’s a big problem in allowing a different type of region in SCF. the current passes continue to work on the ones that have “full iteration” regions. What am I missing?

The way I see it, break isn’t a return on the current block, but on the current region that allows early return. In your example, the if doesn’t allow early return, so the return bypasses it until it reaches a region that does.

In this way, you can nest arbitrary number and types of regions and the break/continue semantics doesn’t need to be calculated, you just iterate through the parents until you reach one that has that property, not unlike try/catch.

It’s not clear to me what a region kind changes to the picture? But we should “whiteboard” this a bit more :slight_smile:
I would rather look if we could “unify” this into meshing with the exiting CFG flow (even if we end up restricting to single-block region maybe).

I have a hard time with a mental model where “break” does more than exiting the current region actually. This kind of “effect at a distance” instead of local reasoning looks a bit “scary” to me from a modeling point of view.

But what do you think of the view proposed by @efriedma-quic ?
The “break” can be just though of as “throwing” a “break exception”, and here the if would be propagating it upward, while the for would “catch” it (this is all conceptual, we’re not talking about runtime exceptions).

Now remains to define how we define the if operation: I’m not sure if we can say “any op with region is ‘re-throwing’ by default” ; I suspect instead that they should opt-in instead and if an op that can “throw” is in a region whose parent isn’t defined as “catching” then this is a verifier error.

Nitpicking: I’m not sure it is explicitly disallowed, and I am sure that it is being used in SCF.

Overall, it looks like having multiple blocks and region-carrying terminators for the purposes of early-exit defies the purpose of regions with early exit: we get a non-trivial block CFG by construction, unless we end up having some clever rules on the sort of CFG allowed in these regions, which essentially amounts to having another region kind.

This can be some transitive lowering step if desired though, or a way to “view” early-exit regions as a mix of regular SESE regions and CFG.

Specifically, I’m trying to see if this can be done without memory for something like

for (...)
  if (...)
    A
    break
  B

    |
    v

for (...)
  broke = False
  if (...)
    A
    broke = True
  if not broke:
    B

where there is a memory dependency between A and B that doesn’t allow one to trivially fission the loop. I suppose we may yield the condition from scf.if, but this looks less local and would probably need the pattern to start at the first operation that can be broken from and descend into all nested operations recursively.

I don’t see a particular problem with using memory either (we have mem2reg after all) other than the usual issue of not hardcoding the dialect from which the memory ops get materialized.

At least not hitting the promised-interface problem :slight_smile:

In fact in VAST, we are actively working towards creating such representation for Clang AST. Having this kind of a region would help a lot. Currently, we use a trait soft_terminator (which is unchecked for terminating the current block) to annotate terminators like break and continue. However, I am not sure how well the high-level representation plays together with the current control-flow infrastructure, as we tend to use this representation mainly for structural representation and have not done much control-flow-based analysis yet.

However, in the high-level representation, without further restructuring or more complex dead-code analysis, it’s not straightforward to eliminate code following a break. For a concrete example, you can refer to this Compiler Explorer showcasing a dummy goto scenario, where the code after break is still live.

1 Like

I’d also like to personally chime in in support of this direction. While I don’t have a use-case for something like break and continue, I ended up with what amounts to the same designing when trying to design a structured catch operation. More specifically it was something akin to:

%res = try {
  ops
  potentially_throws() except_args (%args...)
    continues ^bb1

^bb1:
  ops
  yield %ret
} except %value (%params...) {
  ...
  yield %ret
}

where potentially_throws is a terminator to not require walking all operations to find all throwing operations (one has to walk all basic blocks instead). A pain-point in my case in particular was “easily” allowing SSA-form to be constructed with these (i.e. add block arguments to regions and predecessor terminators and yielding from the try). Any interfaces and improvements there would be welcome.
As mentioned in the OP, a lot of the difficulities (not knowing the predecessors e.g.) are also shared with SCF.