[One-Shot Bufferizer] Bufferization fails in the presence of `tensor.empty()`

I have the following function, that I’d like to bufferize:


func.func public @pooling_nchw_max_d1_s2_3x3(%X: tensor<1x1x16x16xf64>,
                       %Y: tensor<1x1x7x7xf64>) -> () {
  %kernel = tensor.empty() : tensor<3x3xf32>
  %res = linalg.pooling_nchw_max {
    dilations = dense<1> : vector<2xi64>,
    strides = dense<2> : vector<2xi64>
  } ins(%X, %kernel : tensor<1x1x16x16xf64>, tensor<3x3xf32>)
    outs(%Y : tensor<1x1x7x7xf64>) -> tensor<1x1x7x7xf64>
  return
}

I’ve tried two things, both don’t work, for different reasons:

$ mlir-opt-16 linalg.mlir -opaque-pointers=0 --one-shot-bufferize
linalg.mlir:4:13: error: 'tensor.empty' op cannot be bufferized, but can be converted to bufferization.alloc_tensor
  %kernel = tensor.empty() : tensor<3x3xf32>
            ^
linalg.mlir:4:13: note: see current operation: %0 = "tensor.empty"() : () -> tensor<3x3xf32>
linalg.mlir:4:13: error: failed to bufferize op
  %kernel = tensor.empty() : tensor<3x3xf32>
            ^
linalg.mlir:4:13: note: see current operation: %0 = "tensor.empty"() : () -> tensor<3x3xf32>
$ mlir-opt-16 linalg.mlir -opaque-pointers=0 --empty-tensor-to-alloc-tensor --one-shot-bufferize
module {
  func.func public @pooling_nchw_max_d1_s2_3x3(%arg0: tensor<1x1x16x16xf64>, %arg1: tensor<1x1x7x7xf64>) {
    return
  }
}

But ultimately I guess the --empty-tensor-to-alloc-tensor is also not right. My understanding is that the empty tensor is just to specify the shape, is that right? Should I change the input somehow? There’s frustratingly little documentation on the linalg named ops… It seems that in the second version there’s some dce going on that removes the whole linalg operation, is there a way to prevent that?

I have a constraint which is that I link the resulting function with c code that expects it to have the same signature as void foo(const double* x, double* y). I’m not able to get the expected bufferization of destination-passing style.

Hi @superlopuh ,

Let me try to answer your questions:

  1. linalg.mlir -opaque-pointers=0 --one-shot-bufferize here you get the error tensor.empty cannot be bufferized. This is correct since tensor.empty should not bufferize. As you said, tensor.empty just defines the shape of a tensor. If you want to force an allocation, you should use empty-tensor-to-alloc-tensor.
  2. linalg.mlir -opaque-pointers=0 --empty-tensor-to-alloc-tensor --one-shot-bufferize is the right way. I would add one option to one-shot: --one-shot-bufferize="bufferize-function-boundaries". The reason why DCE removes the convolution operation is that, from a tensor perspective, %res is dead; try to return it from the function, and now you should see what you expect. Remember that output tensors in Linalg are of two flavors: init-tensor or shape-only. The former provides an initial value, while the latter is just a shape and should not be used in the payload of the linalg op. I hope this allows you to make progress!

If the function is in destination passing style, this should be the expected result. You can also force bufferization to fails if it introduce allocations within the function: https://github.com/llvm/llvm-project/blob/ffa11183303289654d26b8f761dcf54e611058ca/mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td#L278

@chelini, thank you for your reply. It appears that I have misunderstood the way that the kernel is used. None of the definitions of max pooling I’ve come across have a similar parameter. I’m confused what values it should be filled with to result in the same semantics as the PyTorch definition, which only takes a shape as a parameter. Is there any documentation anywhere as to the definition of the linalg op? I have not found any.

For named ops, the only documentation I am aware of is the one in core_named_ops.py (https://github.com/llvm/llvm-project/blob/b6deea1b53ae84806941c0a43e4f59d3aa40692a/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py#L1285), which, in the case of your operation, is not great. Feel free to open an issue. Looking at the affine maps however it seems matching the Pytorch definition.

Ok, there isn’t some definition I’m not aware of, K is not used in that definition, only its shape. It seems that my intuition was correct, the tensor operand is just a way to specify the shape, and the contents are ignored. I’ll try to get the bufferization to work with @Groverkss’s help.

How do I file a bug for the documentation?

I think a simple issue like: [mlir][spirv] Some op examples in ODS got out of sync · Issue #64272 · llvm/llvm-project · GitHub is enough. Thanks!

1 Like

Thank you, finally filed an issue, please let me know if there’s someone that would be worth pinging about this, or if it’s too broad.

Thank you. I added “mlir::linalg” and “docs” labels. Do you have an example of operations that does not lower without the transform dialect?

I may well have misunderstood, but I don’t know how to lower linalg.softmax. I think this is a good motivation to add the documentation to the ops :slight_smile:

For sure! In case you need to lower softmax today you can do it with the transform dialect in upstream MLIR: [mlir][Linalg] Add an interface to decompose complex ops · llvm/llvm-project@9be8219 · GitHub or if you want a pass-based approach you can take a look at https://github.com/plaidml/tpp-mlir/blob/main/lib/TPP/Transforms/DecomposeAggregatedOps.cpp. We will improve, thanks for raising the issue.

1 Like