[One-Shot-Bufferizer] Error when bufferizing tensor.collapse_shape

Bufferization of a tensor.collapse_shape operation seems to generate a memref.collapse_shape op that does not pass verification. For example, this input:

func @main(%arg0: tensor<1x2049xi64>, %arg1: index) -> tensor<2049xi64> {
    %0 = tensor.collapse_shape %arg0 [[0, 1]] : tensor<1x2049xi64> into tensor<2049xi64>
    return %0 : tensor<2049xi64>
}

when processes with mlir-opt --one-shot-bufferize fails with the error message:

example.mlir:2:10: error: 'memref.collapse_shape' op expected collapsed type to be 'memref<?xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>', but got 'memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>'
    %0 = tensor.collapse_shape %arg0 [[0, 1]] : tensor<1x2049xi64> into tensor<2049xi64>
         ^
example.mlir:2:10: note: see current operation: %1 = "memref.collapse_shape"(%0) {reassociation = [[0, 1]]} : (memref<1x2049xi64, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>) -> memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>
example.mlir:2:10: error: 'memref.collapse_shape' op expected collapsed type to be 'memref<?xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>', but got 'memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>'
    %0 = tensor.collapse_shape %arg0 [[0, 1]] : tensor<1x2049xi64> into tensor<2049xi64>
         ^
example.mlir:2:10: note: see current operation: %1 = "memref.collapse_shape"(%0) {reassociation = [[0, 1]]} : (memref<1x2049xi64, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>) -> memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>

The IR dumped just before invoking the verifier is:

func @main(%arg0: tensor<1x2049xi64>, %arg1: index) -> tensor<2049xi64> {
  %0 = bufferization.to_memref %arg0 : memref<1x2049xi64, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>
  %1 = memref.collapse_shape %0 [[0, 1]] : memref<1x2049xi64, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>> into memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>
  %2 = bufferization.to_tensor %1 : memref<2049xi64, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>
  return %2 : tensor<2049xi64>
}

I’ve run into a similar problem the other day (with -linalg-comprehensive-module-bufferize and -tensor-bufferize) and found that there is already an issue open for this on github: [mlir][bufferization] Failing to bufferize expand_shape/collapse_shape of extract_slice · Issue #54249 · llvm/llvm-project · GitHub. Setting config.useTopDownTraversal = false; did fix this problem, but it breaks some other tests.

Thanks for pointing me to the issue. Unfortunately, top-down-traversal didn’t resolve the problem, at least not with the patch from D120893 (which had to be modified slightly to pass the config to applyPatternsAndFoldGreedily). Is this the one you were using?

I built the commit with this SHA on main: e5b1b9edb8b6f6cd926c2ba3e1ad1b6f767021d6.
I just tried mlir-opt --one-shot-bufferize using this build and it produces the same error that you see. So yours may be a different, but perhaps related issue.

This might not be what you are looking for, but have you tried mlir-opt --func-bufferize --tensor-bufferize? That seems to work for me.

Bufferization of ExpandShapeOp and CollapseShapeOp is currently broken. It has always been. I am working on a fix.

config.useTopDownTraversal = false should not be used, because it leads to worse bufferization results. The bufferized IR can have less precise memref types (e.g, with dynamic layout maps that are not really needed) and even additional buffer copies. That your test case passed was probably accidental, as the faulty code just happened to work.

1 Like

Thanks for the clear feedback @matthias-springer. Looking forward to the fix!

One of the CI test in torch-mlir generates the following IR:

func @collapse_shape(%6: tensor<?x?xf32>, %10: index, %4: index, %16: index) -> tensor<?xf32> {
  %17 = tensor.extract_slice %6[0, %10] [%4, %16] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
  %18 = tensor.cast %17 : tensor<?x?xf32> to tensor<?x1xf32>
  %19 = tensor.collapse_shape %18 [[0, 1]] : tensor<?x1xf32> into tensor<?xf32>
  return %19 : tensor<?xf32>
}

It fails with --tensor-bufferize with a recent llvm-bump with error 'memref.collapse_shape' op expected collapsed type to be 'memref<?xf32, affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>>', but got 'memref<?xf32>'
This used to be passing with older llvm version with generated IR like this:

  func @collapse_shape(%arg0: tensor<?x?xf32>, %arg1: index, %arg2: index, %arg3: index) -> tensor<?xf32> {
    %0 = bufferization.to_memref %arg0 : memref<?x?xf32>
    %1 = memref.subview %0[0, %arg1] [%arg2, %arg3] [1, 1] : memref<?x?xf32> to memref<?x?xf32, #map>
    %2 = bufferization.to_tensor %1 : memref<?x?xf32, #map>
    %3 = bufferization.to_memref %2 : memref<?x?xf32>
    %4 = memref.cast %3 : memref<?x?xf32> to memref<?x1xf32>
    %5 = memref.collapse_shape %4 [[0, 1]] : memref<?x1xf32> into memref<?xf32>
    %6 = bufferization.to_tensor %5 : memref<?xf32>
    return %6 : tensor<?xf32>
  }

Is there a way to temporarily revert to the old behavior for downstream projects?

This is likely caused by the recent change to config.useTopDownTraversal = true. The previous behavior was false, which lead to inefficient bufferization.

The old bufferization was problematic because of this:

    %2 = bufferization.to_tensor %1 : memref<?x?xf32, #map>
    %3 = bufferization.to_memref %2 : memref<?x?xf32>

This is likely lowered to a realloc + memref.memcpy, which you probably do not want.

With config.useTopDownTraversal = true, we generate more memrefs with layout maps. This is good, but it uncovered some bugs in various op verifiers and bufferization patterns.

The particular issue that you are seeing here should be fixed with this change (and the two dependent changes): ⚙ D122649 [mlir][tensor] Fix bufferization of CollapseShapeOp / ExpandShapeOp. There’s still some discussion on these changes, but I hope we can land this soon.

1 Like

The collapse_shape / expand_shape issues should be fixed. Let me know if you run into issues.

Hey Matthias,
Thanks a lot for the fixes!
There is one more issue exposed by the fix. MemRefToLLVM conversion doesn’t support non empty layout map for collapse/expand. https://github.com/llvm/llvm-project/blob/4f4752ee6fd19efa9b7e623c10c5ba5861542dc8/mlir/lib/Conversion/MemRefToLLVM/MemRefToLLVM.cpp#L1368.

With this IR:

func @collapse_shape(%6: tensor<?x?xf32>, %10: index, %4: index, %16: index) -> tensor<?xf32> {
  %17 = tensor.extract_slice %6[0, %10] [%4, %16] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
  %18 = tensor.cast %17 : tensor<?x?xf32> to tensor<?x1xf32>
  %19 = tensor.collapse_shape %18 [[0, 1]] : tensor<?x1xf32> into tensor<?xf32>
  return %19 : tensor<?xf32>
}

and command mlir-opt --func-bufferize --tensor-bufferize --finalizing-bufferize --convert-linalg-to-loops --lower-affine --convert-scf-to-cf --convert-linalg-to-llvm --convert-memref-to-llvm --convert-func-to-llvm --convert-cf-to-llvm --reconcile-unrealized-casts would fail because the memref.collapse_shape has non empty layout map. (This IR and command come from torch-mlir CI tests where we convert the IR all the way to llvm IR to make it run on execution engine.)

Do you have any suggestions about this? Thanks!

Hi Cathy, I’m not too surprised. We have quite a few lowering patterns in MLIR that support only the common cases. Somebody would have to implement the missing functionality. The MemRef → LLVM part is not really my area of expertise. If you read up a little bit about MemRef descriptors etc., I think you should be able to implement the missing functionality. It would probably be a pretty small change…

@matthias-springer I will look into it! Thanks for the pointers!