Bug in rank reducing tensor.extract_slice and subsequent linalg bufferization?

The documentation for tensor.extract_slice states that

An extract_slice operation may additionally reduce the rank of the resulting tensor by removing dimensions that are statically known to be of size 1

And indeed, something like:

func @main(%t0: tensor<5x4xi64>) -> tensor<4xi64> {
  %t1 = tensor.extract_slice %t0[2, 0] [1, 4] [1, 1] : tensor<5x4xi64> to tensor<4xi64>
  return %t1 : tensor<4xi64>
}

roundtrips without any issue. However, subsequent linalg bufferization (e.g. through `mlir-opt --linalg-bufferize) results in an error:

extract_slice.mlir:2:9: error: 'linalg.copy' op expected indexing_map #1 to have 2 dim(s) to match the number of loops                                                                       
  %t1 = tensor.extract_slice %t0[2, 0] [1, 4] [1, 1] : tensor<5x4xi64> to tensor<4xi64>
        ^
extract_slice.mlir:2:9: note: see current operation: "linalg.copy"(%2, %1) ({
^bb0(%arg1: i64, %arg2: i64):
  "linalg.yield"(%arg1) : (i64) -> ()
}) : (memref<1x4xi64, affine_map<(d0, d1) -> (d0 * 4 + d1 + 8)>>, memref<4xi64>) -> ()

In ExtractSliceOpConverter::matchAndRewrite() the original sequence of operations is replaced with the following:

func @main(%arg0: tensor<5x4xi64>) -> tensor<4xi64> {
  %0 = builtin.unrealized_conversion_cast %arg0 : tensor<5x4xi64> to memref<5x4xi64>
  %1 = memref.alloc() : memref<4xi64>
  %2 = memref.subview %0[2, 0] [1, 4] [1, 1] : memref<5x4xi64> to memref<1x4xi64, affine_map<(d0, d1) -> (d0 * 4 + d1 + 8)>>
  %3 = linalg.copy(%2, %1) : memref<1x4xi64, affine_map<(d0, d1) -> (d0 * 4 + d1 + 8)>>, memref<4xi64> 
  return %3 : tensor<4xi64>
}

To me this seems as if the problem came from the subview that does not omit dimensions of size 1, causing the copy operation to fail.

On a more general note what is the difference in semantics between tensor<...>, tensor<1x...>, tensor<1x1x...> and so on?

CC: @nicolasvasilache

When I run: mlir-opt /tmp/aaa.mlir -linalg-comprehensive-module-bufferize="allow-return-memref"

I get:

 ./build/bin/mlir-opt /tmp/aaa.mlir -linalg-comprehensive-module-bufferize="allow-return-memref"
#map0 = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
#map1 = affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>
module {
  func @main(%arg0: memref<5x4xi64, #map0>) -> memref<4xi64, #map1> {
    %0 = memref.subview %arg0[2, 0] [1, 4] [1, 1] : memref<5x4xi64, #map0> to memref<4xi64, #map1>
    return %0 : memref<4xi64, #map1>
  }
}

I am unclear why --linalg-bufferize does something different here.

Not sure how to answer this, tensor<1x1x3xf32> means a tensor of shape [1,1,3] (i.e. for which indices [0, 0, {0, 1, 2}] are valid).

@matthias-springer for the behavior of -linalg-bufferize

-linalg-bufferize is the old bufferization implementation. Looks like it has a bug. Anway, I’m in the process of removing it and replacing it with the new BufferizableOpInterface-based bufferization. So this should be fixed soon.

If possible, use -linalg-comprehensive-module-bufferize="allow-return-memref" instead of the other <dialect>-bufferize passes. There will be fewer memcpys. The name of pass will change soon, as we move it into the bufferization dialect.