Bug in func-bufferize

I’m sorry for posting here, I still am quite bad with the bug tracking tool.
Consider the following function:

#args2 = { indexing_maps = [#map0,#map0],
          iterator_types = ["parallel"] }
func @condBranc(%arg0: i1, %arg1: tensor<?xf32>)-> (tensor<?xf32>) {
  cond_br %arg0, ^bb1, ^bb2
^bb1:
  br ^bb3(%arg1 : tensor<?xf32>)
^bb2:
  %arg2 = linalg.generic #args2 ins(%arg1:tensor<?xf32>) {
  ^bb0(%gen1_arg0: f32):
    %tmp1 = exp %gen1_arg0 : f32
    linalg.yield %tmp1 : f32
  } -> tensor<?xf32>
  br ^bb3(%arg2 : tensor<?xf32>)
^bb3(%1: tensor<?xf32>):
  return %1:tensor<?xf32>
}

The output of mlir-opt --linalg-bufferize --buffer-deallocation --func-bufferize is:

func @condBranc(%arg0: i1, %arg1: memref<?xf32>) -> memref<?xf32> {
    cond_br %arg0, ^bb1, ^bb2
  ^bb1:  // pred: ^bb0
    br ^bb3(%arg1 : memref<?xf32>)
  ^bb2:  // pred: ^bb0
    %c0 = constant 0 : index
    %0 = dim %arg1, %c0 : memref<?xf32>
    %c0_0 = constant 0 : index
    %c1 = constant 1 : index
    %1 = alloc(%0) : memref<?xf32>
    linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%arg1 : memref<?xf32>) outs(%1 : memref<?xf32>) {
    ^bb0(%arg2: f32, %arg3: f32):  // no predecessors
      %3 = exp %arg2 : f32
      linalg.yield %3 : f32
    }
    dealloc %arg1 : memref<?xf32>
    dealloc %1 : memref<?xf32>
    br ^bb3(%1 : memref<?xf32>)
  ^bb3(%2: memref<?xf32>):  // 2 preds: ^bb1, ^bb2
    return %2 : memref<?xf32>
  }

This code is certainly not good because it deallocates the function argument %arg1.
The bad effect seems to disappear if the last two options to mlir-opt are reversed.

Dumitru

Thanks for reporting this! We generally don’t run buffer-deallocation until bufferization is finished, which is why this bug hasn’t been revealed before. So in this case you can avoid this problem by doing -linalg-bufferize -func-bufferize -buffer-deallocation.

What you are seeing here is that tensor_to_memref op is being treated as “allocating” by buffer-deallocation but when we elide it, we don’t insert copies: https://github.com/llvm/llvm-project/blob/1cbf8e89b54de939420d53d7a528bec6fbaf0a55/mlir/lib/Transforms/Bufferize.cpp#L96

It’s a pretty simple fix which I’ll do, but there are a number of places where this kind of thing can happen. Either due to folding tensor_to_memref(tensor_load(memref)) -> memref (easy to fix: just remove the pattern; on my TODO list) or when the dialect conversion framework effectively does the same internally to itself (harder to fix). It’s on my TODO list for this week to dive more into this.

The other alternative is to not mark tensor_to_memref as allocating, but then it’s not obvious how to exactly describe its semantics.