[RFC] Arith dialect versions of affine.apply/min/max

Background and context

affine.apply, affine.min and affine.max are operations that take operands and return a value of type index and an attribute of builtin type AffineMap.

The particularity of the builtin type AffineMap is that it decomposes its operands between so-called “dimensions” and “symbols”. This decomposition was originally motivated by polyhedral compilation and Presburger arithmetic in which we distinguish between:

  1. runtime quantities that follow an arithmetic progression (called “dimensions” in MLIR parlance)
  2. runtime quantities that are constant but statically unknown at compile-time (called “symbols” in MLIR parlance)

For those who remember the Polylib tools :slight_smile:, these were called “domain iterators” and “parameters”.

One interesting property is that the definition of a “dimension” and a “symbol” is not universal: depending on which “op with a region” is considered the root of a particular analysis, the classification of which variable can be interpreted as a “dimension” or a “symbol” may change.

affine.apply was originally introduced as a means to implement classical analysis subject to polyhedral and affine restrictions in MLIR.

Motivation

Since them affine.apply has emerged as a good abstraction to represent more general notions of indexing that revolve around the AffineMap::compose method that do not have restrictions about what can be considered a dim and what must be considered as a symbol.

As the evolution of usages evolved in MLIR, it now appears clear that the implementation of affine.apply conflates 2 independent things:

  1. the machinery and rewrites based on mathematical properties of builtin::AffineMap that are universal and reusable.
  2. the polyhedral-era restrictions on what is allowed to be a “dimension” and a “symbol” in the context of an enclosing op with a region; also known as AffineScope in MLIR (not to be mistaken with a traditional polyhedral SCoP (static control part)).

As a result of this conflation, affine.apply depends on memref::DimOp (through isValidDim).
This makes it impossible to have a tensor.expand/collapse_shape that matches memref.expand/collapse_shape due to cyclic dependencies.

More generally, such an apply operation is a better representation than chains of AddIOp/MulIOp to manipulate in the IR and generally composes better, irrespective of polyhedral-inspired analyses.

Additionally, in the longer term, we anticipate to benefit from canonicalizations and rewrites that mix affine and non-affine indexing. In particular, I have always been a fan of the conciseness of the Halide folding rules once a more general notion of IndexingExpr is available; see for instance Halide/Simplify_And.cpp at master · halide/Halide · GitHub. We hope we can also look at recursive indexing for FFT/sort and other recursive quantities such as shifts, exp and log.

Proposal

Decouple the machinery and rewrites based on mathematical properties of builtin::AffineMap into arith dialect operations and keep AffineScope-related considerations to the affine dialect where they belong. Since AffineMap is already builtin, this should be relatively mechanical.

I am not too attached to the specifics but here are a few questions I expect others will have strong opinions on:

  1. start a new index dialect vs plop ops into arith and split out later on a per-need basis
  2. arith/index.affine_apply/affine_min/affine_max vs other names
  3. how to actually carve out the AffineScope related constraints and make them compose properly. This may be tricky as affine has historically not been very composable with other things.

Thoughts ?

Thanks for reading !

@ftynse , @pifon2a , @bondhugula , @ThomasRaoux

2 Likes

Can you clarify on this part? I see this in the isValidDim code:

  // The dim op is okay if its operand memref/tensor is defined at the top
  // level.
  if (auto dimOp = dyn_cast<memref::DimOp>(op))
    return isTopLevelValue(dimOp.source());
  if (auto dimOp = dyn_cast<tensor::DimOp>(op))
    return isTopLevelValue(dimOp.source());
  return false;

+1 on my side. It has happened several times that I wanted generate some arithmetic expression for tiling, distribution, etc… and having to deal with the affine expression restrictions around symbols vs dimension doesn’t make sense and prevents us from using affine expressions in some cases.

I assume the arith.affine expr would have relaxed validity condition (for instance affine expr doesn’t allow multiply two dimensions)?

I’ll punt to @pifon2a to refresh our memories on what the exact issue was there :slight_smile:

1 Like

Seems like you would want something like a PolynomialExpr which is a different beast?

Atm I am only proposing to make the AffineExpr-based op more reusable, so you’d still use dim * sym or sym * sym if you wanted to use that. The AffineMap::compose rules on that would be slightly different so it really depends on your use case.

+1

I’m in this camp. arith is “simple integer and floating point arithmetic operations”.

We have other contenders: arith.ceildivsi and arith.floordivsi. These fall more under an index umbrella and I don’t feel fully belong in arith. And given Adding unsigned integer ceil and floor in Std Dialect, which I just saw, I think now would be a good time to make a new dialect with these ops.

Linalg depends on Affine:
reifyResultShapes for linalg::TensorExpand/CollapseShapeOp calls createFoldedComposedAffineApply from Linalg/IR/LinalgInterfaces.cpp that creates AffineApplyOp

Affine depends on TensorOps. This is an artefact of the refactoring when we moved some of the ops from Std to MemRefOps and TensorOps. Most likely this we can get rid of

 if (auto dimOp = dyn_cast<tensor::DimOp>(op))
    return isTopLevelValue(dimOp.source());

and the other 2 appearances of tensor::DimOp without changing anything.

The cyclic dependency arises when we add Tensor->Affine dependency because MemRef dialect depends on Tensor dialect.

So if we move linalg::ExpandShapeOp to TensorOps we would get TensorOps → AffineOps → MemRefOps → TensorOps cycle.

The interesting part is that most of the code in MemRefOps.cpp that uses IR/Tensor.h is related to bufferization. I remember there was a discussion about creating a dialect that has BufferCastOp and TensorLoadOp and all related patterns. @herhut

@nicolasvasilache, @gysit In order not to speculate about what would happen if tensor::DimOp is removed from AffineOps.cpp, here is the PR that does exactly that ⚙ D113117 [mlir] Remove `tensor::DimOp` from `AffineOps.cpp`.. It affects 4 tests in Linalg. I am not sure how important these simplifications are.

Getting back to the proposal to move apply/min/max out of AffineOps, I fully support it. I also think that these ops can stay in Arith dialect for a while. Maybe later we will have a more general class of expressions that we want to evaluate, then it would make sense to have a separate dialect.

1 Like

I am generally supportive of this direction, this has come up several times in the past. A couple of remarks from my side:

  • affine.apply itself does not enforce dimension/symbol constraints on its operands - llvm-project/AffineOps.cpp at 2a7c3f8b02bf9c692bd28e6560ced1aaae53ad7e · llvm/llvm-project · GitHub, I think the same is true for min/max.
  • Some canonicalizations of these ops may depend on tensor/memref.dim. Since canonicalizations need not be registered by the dialect of the root operation (however surprising it may look like), canonicalization patterns can be moved to another dialect when switching the dependency direction. Alone, they should not prevent changing the dependency direction. Foldings should not depend on other dialects at all, but rather get the operand that comes from dim as an attribute resulting from folding the dim itself independently.
  • It is not exactly clear what is the semantics of dimensions/symbols in affine maps if they move (it isn’t precisely clear even in the current state). We have canonicalizations that “promote” dimensions to symbols based on the affine provenance rules…

At the same time, I am not supportive of creating the “index” dialect that would also include floor/ceildiv. Instead, I would like apply/min/max operations to be extended to support other integer types. One can argue for an “affine_map” dialect instead, but it shouldn’t include floor/ceildiv either.

Yes I made the effort of quarantining such requirements in the affine dialect a while back (1 - 2 years? details are now fuzzy). We should be in a good place wrt this point.

True, still, since @pifon2a seems to have a way to break the cycle (it’s not like affine does anything useful with tensor::DimOp anyway), this seems like a nice know to untangle greedily.

I’m guilty here, I’ll surface the relevant portion of my comment in this review:


affine.apply canonicalization has assumptions related to AffineScope, these happens in canonicalizePromotedSymbols.
I had to introduce this in 071ca8da918a5aed4758c4b4e27b946663adce58 as a counterpart to promoteComposedSymbolsAsDims.

The rationale at the time was to reduce the mess created by the multi-result + chains of AffineApplyOp design which had its own separate implementation that did not agree with the mathematical form of AffineMap::compose.

As time passed, things got better and all this technical debt could be cleaned up at long last.
I think it is time to drop this AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if still needed, I’d be fine just dropping it altogether).


More precisely, to your point:

I am not sure what is unclear:

  • generally, any Value may be used as a dim or a symbol locally at the place they are introduced
  • AffineExpr/AffineMap cannot express polynomials over dims
  • composition of AffineMaps “compose dims” and “concatenate symbols”
  • any containing op that wants to give dims or symbols a more universal meaning, across the scope it owns, is free to do so (e.g. AffineScope for affine.loop operations).

This is how we’ve been operating for 1-2 years already; current proposed changes to AffineScope leak into the rest of the world via the affine.apply canonicalization; we can remedy those independently of moving the op into arith and later proceed with the refactoring proposed here and future extensions.

I think the key issue here is the canonical form of the operation. We can already have any value as either symbol or dimension in affine apply (we cannot use results of such operations in other affine operations such as load or for, but this is an argument for moving apply out of affine), and we never could have polynomials with dimensions. Composition is where the difference starts to matter. But composition itself is a tool, which may be made orthogonal to op semantics. It is not currently, because the canonical form requires operand sources that are also affine apply to be folded in, and also requires anything that can be proven to have the affine symbol value category to become a symbol in the expression. I think you just want to remove the part about symbols. (I haven’t thought enough about the issue to claim that just transforming everything into symbols outside of affine scopes is okay or not okay). This will make @bondhugula’s change possible without affecting linalg tests. I suspect he, on the other hand, may have use cases in affine that actually want dimension to symbol promotion though. Maybe there is a way to only do that inside ops that extend affine scope and are already subject to affine value categorization.

Yes, this would be interesting to investigate given the origin story of this promotion function…

Promoting dimensional operands to symbols is obviously extremely important and such a form where Value symbols are in symbolic positions would be the canonical form for the application of operands to maps – which is also the point @ftynse makes on the opening line of the para. That’s key. Without it, the IR wouldn’t have a canonical form and there could n different ways to express map + operands as far as dim and symbol choices go – since every symbol Value is also valid as a dim. Knowing something is a symbol lets one do only more in general as far as analysis and transformation goes (since the value would be known to be a constant albeit unknown).

I would like to implement this RFC and move out arithmetic ops out of AffineOps dialect into ArithmeticOps.

Affine maps are used by code generation pipelines that are not going through affine dialect. Unfortunately, the only way to actually apply affine maps can be done with the operations from affine dialect, even though AffineMap is a first-class citizen of MLIR.

affine dialect depends on tensor and memref dialect. At the moment it is not possible to use affine.apply within TensorOps.cpp because it would cause a circular dependency. I would like to untangle dialect dependencies caused by that and also get rid of TensorInferTypeOpInterfaceImpl.cpp and TensorTilingInterfaceImpl.cpp which exist only because they depend on AffineOps.

affine.apply -> arith.apply_map
affine.max -> arith.affine_max
affine.min -> arith.affine_min

Before actually implementing it, I would like to ask if anyone has any objections or better names for the new ops.

@bondhugula, @mehdi_amini , @nicolasvasilache, @ftynse

I am a bit concerned on adding these as-is to the Arithmetic dialect. It feels like expanding Arithmetic beyond the abstraction layer that it currently intends to target. It also isn’t clear to me how the concepts of dimensions/symbols map to the Arithmetic dialect.

I truly understand the tension of wanting to use these constructs without the other constraints/constructs in the affine dialect, but it doesn’t feel to me that they fit naturally within the Arithmetic dialect (somewhat hinted by the fact that there would now be three ways to represent min/max in one dialect).

– River

3 Likes

I mostly have questions into the canonicalization or (lack thereof) between a sequence of arithmetic operation and the the arith.apply_map operation: on a first look I’m mostly concerned about the amount of redundancy between this operation and what can be expressed with operations just in the arith dialect, and why/when would we use one instead of the other (and vice-versa).

1 Like

I have exactly the same concerns as @River707 and @mehdi_amini. affine.apply is not simply another arithmetic operation. Trying to move them out because of certain dialect dependencies would be putting the cart before the horse.

the only way to actually apply affine maps can be done with the operations from affine dialect

For these users, was the goal just to realize the application of the map or to also keep that in a compact mathematical form for further affine analysis and transformation? If it’s the latter, you’d want to layer on top of the affine dialect. The affine.apply op is not simply about applying an affine map but also allowing analysis/opt while keeping things closed or canonical in an affine context (you have dimensional and symbolic operands). You’d want to audit the goals of using an affine.apply in the first place: perhaps some users don’t even care about dims, symbols, and even “affine’ness” and so maybe even fine with multiplying or dividing dimensions (d0 * d1) (d0 % d1), (d0 floordiv d1) — these functions aren’t affine (not even semi-affine) but could be injective and they do compose as well. This also ties into the question as to whether you probably just want to de-abstract to regular arith operations or create another new op that does not use dimensional and symbolic operands but only symbols (s0, s1, …, s2) in the affine map (sort of like an affine.apply with only symbolic operands).

affine.apply within TensorOps.cpp because it would cause a circular dependency.

The dependency comes from the use of DimOp's from both the tensor and memref dialects. This can just be addressed and broken with a trait for “dim”-like ops on shaped types (OpInterface would be more appropriate I think.) The split of std.dim into memref.dim and tensor.dim created this in the first place and looks like a shared trait/interface would anyway be a good thing to do here and break the dependency on the tensor dialect.

1 Like

I’ll refer to the original post that addresses these points.

Each time we have seen such conflation in MLIR, we have cleaned up and separated concerns. I
I would push it even further, there is no specific value to most (all?) of the affine op.
They could all be easily represented as the scf/memref variants + trivial analysis that operands all come from an apply/min/max subject to the AffineScope constraints.
Affine could finally start composing with the rest of MLIR and benefit from the same principles of modularity and optionality that we have been applying everywhere else.

Representing affine dialect quantities as scf/memref + apply operands + AffineScope would be subject to index canonicalization rules that would fold AddI/MulI etc and that we are missing today. It’s the usual schtick: long chains of operations that can be trivially represented as one canonical form benefit analyses and transformations.

Absolutely, I’ll refer—again—to the original post:

In some long-past discussions, we were thinking of breaking down into support for the following closed classes of operations:

  • hyperectangular
  • affine
  • polynomial
  • log/exp/shift
  • indirection with known injective properties (i.e. “safe” gather/scatter).

Not sure I follow, that redundancy already exists today, does it mean we should drop affine.apply and force every user to go through long SSA use-def chains to reconstruct higher-level logic locally?
I don’t imagine you’re suggesting that…

It seems the issue people are mentioning can be summarized as “where do these abstraction live” and "arith is a lower-level dialect". Coming back to the OP:

An index dialect which contains affine and future non-affine indexing and canonicalizations would be fine with me and seems to address the key comments left by others?
This should be done in a way that layers and composes properly with AffineScope.

There is general value in knowing a quantity is fixed or varying, irrespective of it following a particular progression. But this may be more an “index” property than an “arithmetic” property indeed.

1 Like

To clarify: arith.apply_map would be the canonical form for any sequence of individual arith operations when they operate on index type? And the arith.apply_map would be limited to the index type?
We would actively turn any sequence of add/mul/… into arith.apply_map operations?
From a sequence of such individual operations, is there always one canonical form when converting to use arith.apply_map operations?

I also wonder how the arith.apply_map form plays with CSE? I can imagine multiple arith.apply_map which would contain subsequences that could be CSE’d together if they were expressed in the IR.

I don’t think it does not from the point of view of the arith dialect as a unit in itself.
My point is specifically about the cohabitation of affine.apply with the other ops in the arith dialect, instead of layering this in another dialect above it (index dialect as you mention would be fine with me).

Makes sense, thanks for spelling it out, a separate dialect seems to be indeed the better choice.