Cross-dialect folding and canonicalization

How should we handle canonicalization patterns that need to look at multiple dialects with a “non-traditional” flow of dependence?

My concrete example is to canonicalize an add that occurs in a pattern such as:

%c0 = vector.constant 0: ...
%c = vector.contract %a, %b, %c0: ...
%e = add %c, %d: ...

by:

%e = vector.contract %a, %b, %d: ...

This is a canonicalization pattern on add. add is polymorphic and lives in a different dialect than vector (std these days, I think arith in the future). vector depends on standard but not the other way around.

A workaround is to have a specific pattern somewhere else and just apply it.
It is unsatisfactory because we really want this to be a canonicalization that gets applied everywhere it can.

I can’t see a good solution right now, the following don’t seem like good solutions:

  1. create a vector.add in addition to all the other polymorphic add operations.
  2. separate pattern called manually
  3. dynamic registration of canonicalization patterns outside of the op definition file.

Thoughts?

Technically, if the Vector dialect depends on the dialect that defines add, then the Vector dialect can add new canonicalization for this other dialect operations I believe. This is increasing the amount of coupling in the dependency from Vector to this other dialect, so it has to be used with care (any modification of this other dialect could require to update Vector as well).

The patterns in FooOp::getCanonicalizationPatterns don’t need to be rooted on FooOp.

As Mehdi says, as long as there is a correct dialect dependency, then you’re fine. In this case, no dialect dependency (in the milr registration/loading sense) is needed because your pattern does not produce ops from a different dialect. Of course, you still need a C++/linkage/header dependency to be able to reference AddOp.

Here’s a recent example I wrote in IREE: Canonicalize `std.dim` into `flow.dispatch.shape` when relevant. by silvasean · Pull Request #4671 · iree-org/iree · GitHub
The root is a DimOp, even though it’s a canonicalization pattern provided by IREE’s DispatchInputLoadOp.

Cool, thanks for dispelling my bias!

For canonicalization patterns this works well but what about folding? An example that I recently was thinking of is the dim operation. We can quite often fold it the dim operation applied to the result of some other operation to an operand. For example, dim applied to the result of alloc can be folded to one of alloc’s operands in the dynamic case.

When splitting off the memref dialect, we now would need to move those folding patterns to a different place, as the standard dialect cannot depend on the memref dialect. Our current solution is to have a dim operation in the memref dialect.

However, this approach does not scale. We could also have dim applied to an hlo operation, for example. Would we write a canonicalization pattern is such cases or would we rather extend the hooks for folding to support registering multiple folders?

@herhut This is exactly the problem with over-splitting dialects that I mentioned about when the discussion came up! I think it’s a mistake or premature to split the dim and alloc into different dialects. One would just have to keep duplicating ops in different dialects due to such movement (I’m not saying the other splitting done isn’t good.) Here was my post on that:

I’d instead argue that they are in std because they all work on standard types which include int , float , and vectors and tensors of those (even if one doesn’t want to use the term standard , these are all really core types). This also means trying to split it all completely would be fraught with the risk of creating cyclic dependences, duplication of similar logic/helper methods, separation of logically connected things into different dialects, and also a inconsistent partitioning logically and naming wise. One could potentially pull out ops that exclusively work on the tensor class (due to a much higher-level of abstraction) like @_sean_silva lists, but that’s a very thin slice. I don’t think any of the concerns/questions in my third post, or those described in more general terms by @jpienaar and @stephenneuendorffer have been addressed at all.

@bondhugula Let me put in my two cents)

Actually there are 2 dialects involved: built-in and standard. The types are are mentioned (int, float, vector, tensor) are part of built-in dialect, while operations are part of std. And this brings some confusion, at least for me ([RFC] Move ReturnOp to BuiltIn Dialect).

For example, built-in dialect provides func operation, which is generic enough to be used in various scenarios. But it require some terminator and the closest return operation are located in std dialect. So, if someone just what to use func from built-in dialect plus their own dialect, they either need to invent own return-like operation (toy.return) or bring dependency from std dialect with all math, tensor and other logic.

I would argue that dialect splitting is not the problem here. Indeed, putting alloc and dim in different dialects makes it impossible to fold the dim(alloc(X)) and requires to make it a canonicalization pattern instead. But as @herhut points out, the same issue would appear for dim(op-from-custom-dialect(X)). We should not put ops in a dialect just because another op in that dialect can use them in constant folding. Short of putting ops in the same dialect, we would need a dependency of the dialect that contains the op being folded (“root”) on the dialect that contains the op feeding the “root” op, which is also infeasible in case of out-of-tree dialects. That is a problem, dialect splitting just makes it apparent upstream and forces us to solve it.

We could indeed consider a registration mechanism for folders. We can also be bolder and reconsider why we have folding and canonicalization as two different mechanisms with different properties and try to make them at least closer.

Can’t this dim and alloc issue be solved with operation interfaces? The dim folder can rely on some interface for its operand’s producer (MemRefCreateOpInterface?), while alloc can implement this interface. In that case dim will not depend on alloc or any other similar operations directly, but will support folding with any of them.

+1 ; that has been one of the thing that makes me strongly in favor of splitting std all along: the difficulties are just symptoms of limitation of the infrastructure in terms of extensibility and flexibility. The monolithic std prevented us for spotting the missing extensions points.

In this case this is the direction I’d try to look into as well. That does not mean though that we won’t need a solution for folding down the road.

This is what is being attempted here https://reviews.llvm.org/D97532 . Planning to land it this week.

Potentially splitting hairs here but that interface won’t be useful in folding, as it requires to be able to produce IR. Also there is no guarantee that the IR it produces is more canonical than just keeping the dim operation.

So it might still be useful to have a different interface just for the purpose of folding. Or extend that interface to communicate the extra restrictions.

Is there any agreement on how to handle splitted ops in different dialects?
There is still a discussion about splitting the memref.dim into a memref.dim and a tensor.dim that is related to the issues mentioned above.

I think that tensor.dim and memref.dim are two conceptually separate ops, which historically were merged into one. I don’t think there is any special consideration needed, unless I’m misunderstanding something.