Rank-reducing memref.subview OffsetSizeAndStrideOpInterface interface issues

*don’t have bugzilla account so posting here

This code will cause the assert failure during canonicalization

#map0 = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
func @rank_reducing_subview_dim(%arg0 : memref<?x?x?xf32>, %arg1 : index,
    %arg2 : index) -> index
{
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c4 = constant 4 : index
  %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x?xf32, #map0>
  %1 = memref.dim %0, %c1 : memref<?x?xf32, #map0>
  return %1 : index
}

The reason is default implementation of OffsetSizeAndStrideOpInterface isDynamicOffset(idx) and similar functions, which assumes idx corresponds to input memref dimensions, but users of the interface (DimOp::fold in this case) will use dimensions indices from output memref.

Side question: is there any other way except memref.subview to reduce rank of memref with unit-size dimensions (memref.reshape will only work with memrefs with identity affine map)?

Thanks for reporting, sending a fix.

Thanks, I will duplicate my commnets from review

https://reviews.llvm.org/D105558
I am not sure this is a good solution. Requiring from OffsetSizeAndStrideOpInterface users to to do such index manipulation is very inconvenient and error-prone. Maybe better to separate isDynamicDim APIs into 2 sets isInputDynamicDim/isOutputDynamicDim which can do index translation under the hood?

Also, locally I workarounded this by never using rank-reducing subview and instead implementing a custom op to do a rank reduction on memrefs (with this approach I can also precisely control, which dimensions I want to collapse, not just all-or-nothing).

There is no such thing as input and output: the offset/size and stride properties belong to the op, not to the input or output memref.

It is a mistake to see these properties as being attached to the memref.

If you look more closely at the patch you’ll see that getting the n^th dynamic index is a property of the memref, we could indeed put that behind a helper function in ShapedType but since this need hasn’t materialized before I do not think it has more general usage.

The only interface that is accessed from OffsetSizeAndStrideOpInterface is the sizes() (dynamic) operands.

Also, I am unclear what you mean by:

The following is valid IR; you can choose to collapse or not collapse a particular 1 dimension (you still have to provide the proper type of the verifier will complain):

#map = affine_map<(d0, d1, d2)[s0, s1, s2, s3] -> (d0 * s1 + s0 + d1 * s2 + d2 * s3)>
module  {
  func @rank_reducing_subview_dim(%arg0: memref<?x?x?xf32>, %arg1: index, %arg2: index) -> index {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %c4 = constant 4 : index
    %c2 = constant 2 : index
    %0 = memref.subview %arg0[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : memref<?x?x?xf32> to memref<?x1x?xf32, #map>
    %1 = memref.dim %0, %c2 : memref<?x1x?xf32, #map>
    return %1 : index
  }
}

You can also use linalg.collapse_shape / linalg.expand_shape to manipulate the type and drop / insert 1s. @pifon2a is currently working on moving those to the memref dialect as per [RFC] Reshape Ops Restructuring.

Those index manipulations in the patch are looking arcane to someone who is unaware of context. And they are needed at least in 2 places now (and possibly more in the future).

Regarding rank reduction, it wasn’t clear from docs that subview can do partial rank reduction (and I somehow completely missed linalg.collapse_shape).

Thanks

Fair enough, added a helper function to refactor that logic and give it a name.
PTAL.

@nicolasvasilache And what exactly happened in the review?

Also, I cheked linalg.collapse_shape and it seems it have llvm lowering only for fully static shape case, it won’t lower conversions like <1x1x?xi32><?xi32>.

Unfortunately this is proving more complex than the simple fix I sent in https://reviews.llvm.org/D105558.

The problem with that fix is that it is not extensible:

  • OffsetSizeAndStrideOpInterface does not have enough information to resolve a result dimension:
    • for “extract” type op, it is simply the proper value in the sizes operand (what D105558 implements)
    • for “insert” type op, it just forwards to the “dest” operand (also what D105558 implements but by special casing)
  • special casing is not extensible and the above patch breaks this IREE Flow Op (i.e. core cannot special case and should use an interface)

The proper core interface to use for this seems to be InferTypeOpInterface.
However, folks have recently converged on moving the “canonicalization” of dim(InferTypeOpInterface) in a separate pass (https://reviews.llvm.org/D104321).

It seems adopting this more generally would mean moving all DimOp canonicalization patterns to this pass (i.e. why should there be some producer ops that canonicalize automatically while some others require a special pass).

This brings me to trying to make ops that conform to OffsetSizeAndStrideOpInterface also conform to InferTypeOpInterface for the purpose of enabling the folding of DimOp.

This is however not straightforward because the result may be any type where 1’s are folded (i.e. if tensor<1x?x1x2xf32> is a valid return type then so is tensor<1x?x2xf32> and so is tensor<?x1x2xf32> or tensor<?x2xf32>).

Dropping the rank-reducing semantics does not appear like a good option because a combination of ops will be needed to achieve the same effect which will result in special-casing in transformations which is a red flag.

Another simple way would be to extend OffsetSizeAndStrideOpInterface with extra information to describe whether the op is an “insert”-like or an “extract”-like but this seems to go against the spirit of InferTypeOpInterface and https://reviews.llvm.org/D104321.

In any case we should be consistent across our decisions, so my first questions comes from looking at Remove canonicalizer for memref.dim via ShapedTypeOpInterface.

How to we decide for a given op whether “folding” dim(op.result()):

  • should be a canonicalization (which implies that op must not be InferTypeOpInterface)
  • should not be a canonicalization ?

It seems there is an undesired coupling between “op semantics” (i.e. is it an InferTypeOpInterface) and “canonicalization vs pass”.

For example, it is unclear to me why AllocOp should not be InferTypeOpInterface.

Lastly, is there a particular recommendation how to address the original problem in this thread, in light of the discussion above?

@_sean_silva @MaheshRavishankar @herhut @stellaraccident

see my post above, the problem is generally more complex because we need extensibility and there are other moving parts in the system. Core can only special case for the ops it knows about.

Also @tpopp since the generalization aspect is related to ⚙ D103076 [mlir] Fold memref.dim of OffsetSizeAndStrideOpInterface outputs.

Thanks, as I mentioned before I workarounded original issue locally by using custom op to do rank reduction after subview. I have tried to replace custom op with linalg.collapse_shape but it don’t have llvm lowering for my specific case (collapsing dynamic dim with number of static 1s dims). Is there any active work on supporting this? Supporting this specific lowering case seems straightforward on the first glance and I can try to implement it myself.

IREE uses expand/collapse also with dynamic shapes but I think they lower out though a different path (which also works with Vulkan/SPIRV which have other constraints).

I don’t know of anyone actively looking at extending lowering of these ops to LLVM in core.
Adding support for more use cases would be a very welcome contribution, thanks much!

Could this become a memref.reshape? That already has a lowering.

So my take is that if it does not produce any new IR but just forwards operands, it is a canonicalization. The dim(alloc(...)) falls into that category.

Why can we not canonicalize for ops that implement InferTypeOpInterface?

This would even be a folding in my book.

Isn’t that what everyone agreed to in this post ?Remove canonicalizer for memref.dim via ShapedTypeOpInterface

memref.reshape requires identity layout map according to docs (which this is not true for subview result) but I can’t find if it enforced in code anywhere.

Actually, I am not sure about linalg.collapse_shape either, as it can work as view OR copy data depending on parameters, but I want to always have view in this case (and op having ViewLikeOpInterface)

I think the issue is that that interface does too much and a subset of it could be considered a canonicalization whereas most of it should not. The specific thing I think you (and I) are reaching for would be something like ResultDimMapsToOperandDimOpInterface (spelled out for emphasis). There is, in fact, one method on that interface that does this specifically, but iirc, it is defaulted in a way that makes it fall back to the general type inference methods, which can (and in some dialects) do anything, including creating arbitrary IR in other dialects, etc.

If we had that separation, I think there is a good argument that it is in the scope of canonicalization to replace a result dim with a (possibly scoped/trivial expression of a) corresponding operand dim.

1 Like

Answering a few general questions here (I looked through the post and the original patch, so have an idea of whats happening here, but dont want to delve into those details right now). Few related points on this AFAICS are

  1. My current view is that dim canonicalizations should really be a separate pass, and the whole list of DimOp::fold that does the “if (isa<op>()) {...}” should be deprecated. In general this is not a fold unless the dimension is static. For the dynamic case (which really should be the case to use for reasoning about folds/canonicalize/passes) resolving a dim of an operation in terms of its operands creates new dims. If it doesn’t in some cases, then special casing of that in fold is really confusing and does not have good separation of concerns.

  2. Resolving dims using the InferShapedTypeInterface is really a fixed point iteration computation. Piggy backing this on canonicalization is probably a mistake. Its doing too much at once. The canonicalization should be meant to make an operation get into its “canonical” form. Doing a fixed point iteration to resolve memref.dim operations along with canonicalizations is an overkill. As Stella found out, it is probably better to be more purposeful about when you want to resolve shapes. I would expect a client compilation pipeline to run canonicalization more frequently and shape resolution less frequently.

  3. These are sort of unrelated to the original problem mentioned above. I dont think there should be a dim canonicalization based on the OffsetSizeAndStrideInterface if it does not have enough information to compute the shape of the result. Should definitely drop that from the dim fold/canonicalization.