Structured Control Flow is Not Necessarily Regions

Authors: @_sean_silva @ftynse @herhut @mehdi_amini (alphabetical order)

This document was marinating in my todo list way longer than we all would have liked, and @mehdi_amini repeatedly mentioned that we should bring it up for public discussion. I think now is about the right time, with parallel discussions on relaxing dominance and building IR structurally. The goal is to make people aware of an alternative representation for structured control flow in MLIR, there is no proposal at the moment, but a discussion is most welcome.

Intro

MLIR regions allow for a natural recursive nesting of the IR, which can be leveraged to represent structuring constructs like modules and functions, but also control flow constructs such as loops and conditionals. This representation makes it convenient to operate on these constructs as units. Examples include higher-level loop transformations, such as tiling, that can be easily implemented by mutating a single “loop” operation and function-level code motion, such as outlining the body of gpu.launch into a new GPU kernel function. In general regions provide a strong isolation of a section of the code.

When the structured control flow is concerned, we have been focusing on replicating the structural aspects of the control flow into the structure of the IR. This document explores an alternative representation of structured control flow using custom terminator operations.

Motivation

While the structural representation of the control flow with nested regions has some advantages, as introduced above, it also has some drawbacks:

  1. If the region takes arguments, there is no way (other than hard-coding for specific operations) for a transformation to reason about these arguments: is such a block argument mapping to an explicitly captured SSA value from the enclosing region? Is it a loop induction variable? Is this a loop carried value?
  2. Producing SSA values in the region and escaping them in the enclosing region requires going through a custom operation in the region and returning value from the enclosing operation. This again can’t be tracked through SSA use-def chains and require special reasoning on individual operation, leading to writing analyses and transformations that are hardcoded for specific operations.

Alternative Representation

The alternative explored in this document is to rely on custom terminators to represent structured control flow: the nested “regions” can be seen as SESE (single-entry single-exit) CFGs. The important property is to be able to identify this CFG as a single unit, and make sure that generic algorithms can’t change the control flow in a way that breaks the SESE property.

Conditionals

For example the structured conditional:

 %res:2 = loop.if %cond {
    %true:2 = std.call @func1 (%input0, %input1)
    loop.yield %true#0, %true#1
  } else {
    %false:2 = std.call @func2 (%input2, %input3)
    loop.yield %false#0, %false#1
  }
  print %res#0, %res#1

could be represented in a CFG:

  // The ^end successor delimits the single-exit point of the region
  cf.if %cond ^true, ^false, ^end
^true:
  %true:2 = std.call @func1 (%input0, %input1)
  br ^end(%true#0, %true#1)
^false:
  %true:2 = std.call @func1 (%input0, %input1)
  br ^end(%true#0, %true#1)
^end(%res0, %res1):
  print %res0, %res1

“For” loop

loop.for %iter = %lb to %ub step %step iter_args(%arg) {
  %new_arg = "body"(%arg)
  loop.yield %new_arg
}

could be represented in a CFG:

  cf.for ^body(%lb, %arg), ^continuation(%arg)
^body(%iter, %iter_arg):
  %new_arg = "body"(%iter_arg)
  cf.next ^body(%iter + %step < %ub, %new_arg), ^continuation(%new_arg)
  // The previous instruction merely uses custom syntax, it looks like
  // "cf.next"(%iter, %ub, %step) [^body(%iter, %new_arg), ^continuation(^new_arg)]
^continuation(%final_arg):

Note, however, that this relies on the cf.next terminator passing a modified value (%iter + %step) to its successor block. If we were to restrict terminators to only pass the values as is, the induction variable increment must happen as a separate operation, as in the following.

  cf.for ^body(%lb, %arg), ^continuation(%arg)
^body(%iter, %iter_arg):
  %new_arg = "body"(%iter_arg)
  %new_iter = cf.for_step %iter, %step
  cf.next %new_iter < %ub, ^body(%new_iter, %new_arg), ^continuation(%new_arg)
^continuation(%new_arg):

This representation arguably becomes more difficult to maintain: one needs to ensure that the argument passed to ^body body ^cf.next is defined by a cf.for_step. This fact can be further used to query the loop about its step.

It is unclear at the moment if there is a terminator-based representation that would combine simple value forwarding in the body terminator with the absence of explicit stepping instruction.

Having to traverse multiple instructions to extract the information about loop bounds and steps is one of the reasons that make advanced loop transformation hard on CFG.

“While” loop

MLIR Loop dialect currently does not support a generic “while” loop, as we haven’t yet seen a compelling use case. However, such loop can be designed in a rather straightforward way by combining the original dialect proposal and recent improvements for loop-carried values. For example, it can be expressed with two regions, one for the condition and another for the loop body, both taking loop-carried values as arguments. The condition region yields an i1 value indicating whether the loop should continue executing. The body region yields the updated loop-carried values that are forwarded either to the next iteration or to the results of the op.

%result = loop.while(%arg = %loop_carried) condition {
  %condition = "func"(%arg)
  loop.yield %condition
} body {
  %new_arg = "body"(%arg)
  loop.yield %new_arg
}

Such loop would become the following CFG.

  cf.while ^condition(%loop_carried)
^condition(%arg):
  %condition = "func"(%arg)
  cond_br %condition ^body, ^continuation(%arg)
^body:
  %new_arg = "body"(%arg)
  br ^condition(%new_arg)
^continuation:

Discussion

Extensibility of terminators

MLIR has not sufficiently explored op extensibility for terminators. So far, terminators are either carried over from the LLVM IR or other “classical” CFGs (branches, switches) or tightly connected to the semantics of the surrounding region (affine.terminator, loop.yield). The latter don’t have to be attached to the region and may carry the same semantics on their own.

Furthermore, MLIR specification is not detailed enough about what behavior is allowed for successor operands in terminators. Most (all?) terminators currently forward successor operands as-is to block arguments, but this does not seem to be strictly required. A recent proposal, now implemented, merged successor operands and let the terminator op define the relation between operands and arguments of successor blocks (which may be non-trivial, e.g., the loop-stepping terminator in the “for” loop above). This diverged further from PHI representation. The impact on canonical implementations of generic passes that expect a clear correspondence between successor operands and block arguments is mitigated by implementing an OpInterface for querying such direct correspondences.

Fitting representation to analysis needs

The core issue we are facing with control flow is the representation expectation mismatch between different transformations: “textbook” forms of generic SSA-based transformations, such as GVN, are defined on unnested CFGs; similarly “textbook” forms of advanced loop transformations, such as tiling or interchange, are defined on nested loops. If we want both kinds of transformations to be easily implementable in MLIR, we would need to make the IR in one form look as if it were in the other form. Currently, the representation choices are skewed towards structural transformations. The alternative representation proposal above skews the choice towards CFG-based transformations. In order to implement structural transformations as easily as we can today, we would need to provide the same kind of structured interfaces on top of CFG. (Note that specific *Op classes are practically wrappers around Operation*, so we may just as well wrap a range of Blocks, for example with cast<WhileOp>(terminator) APIs). Such wrappers should be combined with verifiers that ensure structural consistency of the control flow construct, e.g. successor blocks have the right terminators, a subgraph of a CFG remains SESE, etc. Verifiers that observe successors are uncommon and have not necessarily been experimented in MLIR.

An alternative to the alternative representation is to analyze what specifically is missing for easy implementation of “textbook” CFG-based algorithms and whether it can be provided through OpIterfaces. For example, we can introduce an interface for Ops to self-describe as forwarding operands into an equivalent of PHI node of the entry block in a region (potentially reusable with the interface proposed for branch-like Ops). One particular challenge is dominance analysis, since dominance is a first-class concept in the core IR. Therefore, it may be undesirable to let individual Ops affect dominance rules, especially for potentially unregistered Ops. It is worth noting that value visibility rules, defined in terms of dominance, can already be restricted by individual ops through the IsolatedFromAbove trait, which is relied upon by the core infrastructure for pass parallelization. Conceptually a similar approach can be considered for dominance-based analyses and transformations, for which one can clearly identify the always-valid conservative treatment, e.g. assuming there is or isn’t any dominance relation.

Controlling complexity through intentional opacity / Parallelizing the compiler

The alternative representation proposal above does not imply regions should be removed, but rather they should be used judiciously. Regions can be used to intentionally make parts of the IR opaque to CFG-based passes, as is currently the case with functions and modules. They also provide a natural structure for independent or parallel processing of IR fragments.

After River’s recent changes to terminators I think this is doable in MLIR. See e.g. BranchOpInterface in ControlFlowInterfaces.td which allows an op to customize which operands flow to which successor operands (including having successor operands that would take on some other semantic like an induction variable.

1 Like

My (perhaps naiive) assumption has been that this is simply the lack of appropriate OpInterfaces, rather than a fundamental lack. It seems to me that some ops will in fact abstract an SESE CFG and there should be an interface which describes this abstraction, whereas other operations which do not abstract an SESE CFG will remain black boxes. Such an OpInterface could allow querying the mapping between operands and block arguments and between terminator operands and return values to implement (among other things) logical traversal of the def-use chains.

It seems like having terminators describe the fact that they are actually contained within structured groups of blocks (what I think you are proposing here?) is alot more awkward than having structured operations describe more about how they are structured.

Steve

1 Like

I agree with you. This is an area that I have been trying to improve and something that I’ll be focusing on much more concretely in the coming weeks.

+1 Stephen and River. Though I was one of the initial proponents of the “custom terminators” approach, I think part of that was coming from a position in the TF dialect where due to other constraints there were really weird semantics involved, to the point that it was really hard to see a general solution that would cover it (and for various reasons, only some truly non-trivial technical problems, it didn’t seem very easy to convert TF dialect into a nicer form).

But now I see more clearly that we can probably define something relatively clean for ops-with-regions for MLIR in general, and systems like TF dialect will need to do some non-trivial efforts to convert what they are starting with into the form that MLIR can “natively” understand, and that’s fine.

Will even the semantics of these structured control flow operations (e.g. if the condition on an if is true then the first region is executed, else the second region is executed) be exposed through OpInterfaces? If yes, what would that look like?

I agree with the other comments above - while it is important for MLIR to be able to do the “custom terminator” approach, it doesn’t reduce the need for region-based control flow to be improved. Getting branch op interfaces to model the argument/return value transfer seems like a valuable thing regardless. I’m also concerned about the fragility of the custom terminator approach - what prevents CFG transformations from breaking invariants?

In your motivation section you list two specific problems w.r.t. doing generic analysis on region based control flow operators. Are you facing a specific example of this general issue, or is this a general concern?

-Chris

This is part of the thought experiment here: in which way custom terminators are enough to enforce the structured invariant today?
For example if I take the if/else example:

  // The ^end successor delimits the single-exit point of the region
  cf.if %cond ^true, ^false, ^end
^true:
  %true:2 = std.call @func1 (%input0, %input1)
  br ^end(%true#0, %true#1)
^false:
  %true:2 = std.call @func1 (%input0, %input1)
  br ^end(%true#0, %true#1)
^end(%res0, %res1):
  print %res0, %res1

The ^end block has for predecessor the br in ^true and ^false blocks, but also the original cf.if.
I would imagine that to be able to do anything with respect to the control-flow here would require to understand the cf.if terminator itself?

I don’t think we have a very specific problem to solve right now, this is a doc we started in early December and were brainstorming about, we just wanted to share these thoughts out in case it can fuel other’s.

Yes, I think it could be made to work. The challenge would be a couple of things AFAICT:

  1. resisting the urge to implement various op interfaces that normal terminators implement - that would start opening them up to analysis and transformation, which could break their invariants. It Isn’t clear to me how the problems in the ‘motivation’ section get solved without at least some of this.

  2. Being able to decide what the invariants are for things within their body - you could have arbitrary control flow etc within the virtual regions.

  3. Depending on the design, you have to be prepared for the operations to be duplicated, which makes certain things messy.

The biggest challenge I see is that it becomes hard to associate the start/end region pairs with each other, and efficiently walking the control flow hierarchy. This is one of the things that region based control flow makes really trivial and nice.

-Chris

This is the conclusion we reach in the “Fitting representation to analysis needs”. Though this raises the question of certain OpInterfaces potentially being “core” to the IR itself, like the IsolatedFromAbove trait.

The description of the block structure needs to be placed somewhere, i.e. on some op. IMO, it’s only a matter of where: the root op or the terminator op.

If it needs to be exposed, then yes. But it’s likely to be more analysis- and transformation-oriented than op-oriented. For example, if we are interested in terminator arguments that affect the control flow, rather than being directly forwarded to the block, for the purposes of modeling GPU convergent execution, we can have an interface for querying the “control-affecting” property of a terminator argument. Branch condition would have it, but not the other branch arguments. I don’t see an immediate use for an “IfLike” interface though.

We never proposed otherwise.

I would argue that the situation is mostly the same for transformations without custom terminators. There is a set of invariants we check in verifiers. If a transformation breaks them, verifiers fail, but there is little that actually prevents transformations for violating those invariants. It feels like the main technical thing is the APIs we have for operations that make it hard, but not impossible, to break invariants: e.g., setStep on a loop takes an integer, but nothing prevents a pass from doing setAttr("step", builder.getStringAttr("garbage")). We would need verifiers that observe successors/predecessors and their terminators, and convenient APIs to manipulate such multi-block constructs in an invariant-preserving way, but it does not look substantially different than what we do with ops today.

That being said, manipulating arbitrary implicit “regions” of IR does not sound as conceptually clean as manipulating ops with materialized regions.

This is more of a design exercise. One of the original prompts for doing was the idea of having someone, an intern or GSoC student, to write “classic” passes like GVN or jump-threading in MLIR by taking a textbook description of those algorithms and following that. Looks like we will need our own textbook.:slight_smile:

MLIR’s verifier only does fairly local verification (and that’s on purpose, and fine). I think the concern is nonlocal properties. For example, if you have a custom “if” terminator, how would you actually verify that there are no branches between the two “regions”.

E.g. consider this code:

c = ...
if cond():
  if c > 1: # bb0
    x = 2
  else: # bb1
    x = 3
else:
  if c > 1: # bb2
    x = 2
  else: # bb3
    x = 3

How would you define the semantics such that it is illegal for a pass to redirect all branches with bb1 as their target to instead target bb3? They are provably identical, but this is a non-local property (you can imagine adding much more nesting to this example and the same basic idea stands).

Reading through this a couple of times, I land at the same reasoning from @stephenneuendorffer and @River707 noted above. Having interface described relationship between operands and region arguments (i.e. entry basic block arguments), and the op result and region terminator operands seems like it will solve the def-use chain analysis problem.
Coming from codegen, having a single op that is self-contained and allows reasoning about the op without having to reasoning about the control flow within the op which makes progressive lowering really easy. You have to make local decision instead of global decisions. I see an easier path to reason about synchronization, etc. in this system. Its true that a lot of it is due to the SESE nature, but it is also semantics associated with the op.For example, a spv.loop semantics is guaranteed to reconverge in the merge block. Over time I think a subset of operation interfaces useful for general analysis will develop (like function-like, and loop-like) that will allow dialects/ops to opt into these transformations. As an example, I dont think GVN makes sense for linalg operations (where the region arguments <-> operands relationship is more involved).

I would write a verifier for the terminator that checks the dominance property across blocks.

Why couldn’t this be carried exactly the same way with a custom terminator?

I am missing the larger context of the question here, but does the use of custom terminator mean that there isnt a single op that represents constructs such as spv.loop, loop.parallel, etc. That is what is the most convenient part IMO.

You were talking about the fact that spv.loop ensures a particular semantic, I’m saying that I don’t see why this property can’t be achieved the same way here, using svp.if instead of cf.if in the condition example above. You have a clear entry point for the region and a clear reconvergence point.

I think what is not clear to me is how do I know that this SESE region without actually looking through the terminators for the different blocks. So two things that would be good to clarify is

  1. It seems like the approach of custom terminators is then putting load on the terminator to capture the semantics of the SESE region, as opposed to what is currently done (have the semantics attached to the operation that has an enclosing region.
  2. Now I have the same question as Sean, when there are nested ifs or loops, how do I disambiguate between the different loops and branches. That machinery seems more complex (like the traditional loop back edge detection, etc.) than what I am used to right now.

I think it would help concretize the intent with custom terminators if there is a more concrete example of what that enables, as opposed to using op interfaces like mentioned earlier would not provide. I am biased towards the current way structured control flow exists (loop.parallel, loop.for, loop.if, etc.) which make writing transformation really easy.

Has there been any discussion on an extended “for” like loop that also has an early exit condition? We’d have a use for such a concept in FIR.

More specifically, a combination of “while” and “for”, where we’d have an iteration space projected onto the loop’s region like “for”, but only while some condition holds. I can imagine several design approaches but am wondering if anyone has considered such a thing.

Yes.

It seems to me that this isn’t really much more complex than today, every spv.if (same for cf.for in the original example) would be a new nested one if it is in the SESE region. The SESE region is structurally trivial to figure out from a terminator.
You can almost build exactly the same Region API that we have today with this representation: this is as much structural as the current IR.

The problem is that the “op interfaces” that would make the current structured control flow analyzable don’t exist today as far as I know. Today there is no way to generically trace the control flow across an operation with a region.

I am literally working on this right now :wink: