Relation between SSA and MLIR

Hello everybody,

I try to summarize for a paper the relation between SSA and MLIR.

My intuitive understanding is that MLIR is formed of the core SSA formalism (basic blocks and branching operations) enriched with dialects (sets of types and operations). In this understanding, the branching operations br and cond_br must be part of the core formalism because dominance cannot be defined without the sequential graph structure specified by basic blocks and branching operations.

However, the br and cond_br operations are not part of the core formalism, but rather part of the cf dialect. In fact, the core notion of branching required to perform dominance analysis does not reside in the cf.br and cf.cond_br operations, but rather in the BranchOpInterface trait, which other operations have, such as llvm.br or cf.switch.

But beyond this, it appears that specification styles can be introduced that never use branching and never rely on dominance for causality checking (and that, even without getting into dominance relaxation). So I was curious whether my original intuition MLIR = CoreSSA+Dialects is still the core paradigm (can I reliably present MLIR as such, at an intuitive level).

I did not count the relaxation of dominance in my discussion above, because it severely restricts what can be done as transformation, and so I’d like to present it separately (locally breaking the SSA rules for high-level specification purposes).

Best regards,
Dumitru

This explanation seems to be mixing two different concepts: SSA (values defined once and before being used) and dialects (core concepts being extensible). If you want to describe the core IR structure, it would be something like “SSA with relaxable dominance + nested regions”.

Regions are an important piece the description is currently overlooking. Operations can have attached regions, which in turn contain blocks of operations recursively. This is a powerful construct on top of conventional SSA. As you may have noticed, there are two different kinds: SSACFG and Graph. The former requires values to be defined before being used and is subject to the conventional dominance rules. The latter does not, but still requires the value to be defined exactly once, hence my “SSA with relaxable dominance”. Both region kinds are equally first-class, one shouldn’t discard Graph regions as some local hack to relax dominance in isolated cases.

As for branches, actually none of the operations are part of the core formalism, or at least they shouldn’t be. The fact that some operations live in the built-in dialect is either legacy or convenience. The specific behavior of branching operations isn’t actually relevant. What is relevant is that these operations transfer control flow somewhere. Note that this may include transferring it to the parent operation of the current region, not just a block in the same region. We use the term successors to indicate the blocks to which a (terminator) operation may transfer the control flow. Knowing the successors is sufficient to build dominance information, which in MLIR’s case also accounts for region nesting. BranchOpInterface doesn’t seem to be even used in dominance analysis, it a facility for some transformations that need more information about control flow than just dominance.

I know how to explain the region mechanism - it’s a configurable (dialect-defined) form of hierarchy.

To make things simpler for people that are familiar with more traditional SSA, I also want to say that relaxed dominance is a degraded mode, where dominance check is not performed, which precludes certain/most code transformations.

This is what prompted my question here, because in the absence of branching, even the notion of dominance (relaxed or not) does not really/fully make sense. It’s restricted to sequence inside basic blocks.

It precludes transformations that expect CFG dominance and enables graph transformations that actually want cycles. It really should be presented as an extension or something beneficial.

The notion of branching, or rather control flow transfer to avoid unnecessary conflation with a specific branch-like operations, is included in the core IR. A list of successors is a fundamental property of an operation. This doesn’t require one to know the specific semantics of the operation.

Even without cycles, isn’t enabling a “sea-of-nodes” representation?

Of course it’s positive, it enables reuse. Then, extension vs. restriction is like the discussion about covariant vs. contravariant.

So what you’re saying is that branching is actually represented in the internal data structures, of which branching operations are mere instances. Ok.

Thanks for the input.

Without branching, the phi nodes are missing.

I would agree with this… Under a certain set of assumptions, you get SSACFG regions, which support a certain ‘algebra’ of properties and transformations. Under a different set of assumptions, you get Graph Regions, which support a different ‘algebra’ of operations. One isn’t so much a superset of the other, as much as they are disjoint, with some possible programs capable of being represented in either one.

@dpotop I think the key thing to realize here is that the SSA formalism in the MLIR CFG world isn’t defined by the dialect and traits. It is defined by special “successor operands” that some operations can have.

The interpretation of these (e.g. targets for condbr, or switch, etc) are up to the definition of the operation, but the MLIR dominance machinery uses them to define the CFG edges for the SSA graph independent of that.

-Chris