[mlir][linalg] `EraseIdentityGenericOp` canonicalization pattern

Hey, one of the linalg::genericOp canonicalizations patterns is EraseIdentityGenericOp which is removing the linalg generic op if it meets the two conditions:

  1. all of the iterator types are parallel.
  2. the body has only one linalg.yield.

When running the linalg.generic canonicalization’s patterns on an Identity linalg.generic which one of its output operand is a result of bufferization.alloc_tensor() It’s still erasing my generic op. Although, the alloc_tensor() op indicates a new bufferization/materialized tensor.

would attach an example for my code

func.func @slice_kernel(%arg0: tensor<4x4x4x4xf32>) -> (tensor<1x4x4x4xf32>){
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0, 0] [1, 4, 4, 4] [1, 1, 1, 1] : tensor<4x4x4x4xf32> to tensor<4x4x4xf32>
  %0 = bufferization.alloc_tensor() : tensor<1x4x4x4xf32>
  %1 = linalg.generic {indexing_maps = [affine_map<(d0,d1,d2,d3) -> (d0,d1,d2,d3)>, affine_map<(d0,d1,d2,d3) -> (d0,d1,d2,d3)>], iterator_types = ["parallel","parallel","parallel","parallel"]} ins(%extracted_slice : tensor<1x4x4x4xf32>) outs(%0 : tensor<1x4x4x4xf32>) {
  ^bb0(%in: f32, %out: f32):
    linalg.yield %in : f32
  } -> tensor<1x4x4x4xf32>
  return %1 : tensor<1x4x4x4xf32>
}

I expect not removing this linalg.generic since it’s not a default/redundant copy, shouldn’t the RewritePattern checks this case and fail in this case?

Thanks,

Maybe your program is a simpler representation of what your real problem is but as stated, it does make sense to me that %1 is replaced with %0. Essentially they are the same value, cause the second operation is just doing a copy, and removing the unnecessary copy makes sense to me.
(Also as written your repro should fail verification since the tensor type of %1 is not consistent, but that might just be a typo).

Hey,
thanks for your reply Mahesh It’s indeed a typo (just edited manually my mlir code to make it simpler, modified it).
Removing this copy should not change the program behavior (since the user intended to allocate a new buffer for this tensor)?

The concept of « buffer for a tensor » isn’t really well defined: tensors are immutable value-based entities.
It’s not clear what is the observable behavior to preserve here?

I don’t know what you mean by “default” copy, but this is value semantics, and that’s an identity operation. This is the simplest form of DCE and perfectly valid.

The confusion is probably coming from the fact that bufferization.alloc_tensor() is an op specific to the bufferization process and has no meaning outside of it. For example, the doc says:

" The result of a bufferization.alloc_tensor is a tensor value that can be used like any other tensor value.".

This means if you run cleanups on top of that code, DCE is free to remove identity operations because you are not in buffer semantics (memref) yet.

If you are using that op as a way to “allocate a tensor”, don’t. There is no such thing as “allocating a tensor”, as @mehdi_amini said. Just use tensor.empty and bufferization will (hopefully) know what to do.

1 Like

Side note: I feel like we’ve had a few rounds of confusion here recently and it may be good to formalize the answer somewhere (or if it is, make it more discoverable).

1 Like

Thanks for your responses.

@mehdi_amini @rengolin @MaheshRavishankar @jpienaar I would try to explain my purpose:

I want to write a kernel/function in MLIR which has one tensor argument, and it returns a part/slice of this tensor argument, the kernel is in tensor semantics.
However, I want to guarantee that after running the one-shot-bufferization I would get a different memrefs/allocs for the argument and the return value of the kernel.

For this purpose, I’ve tried to insert operations like tensor.empty/bufferization.alloc_tensor() besides copying the slice of the argument to this new SSA value of the tensor.empty/bufferization.alloc_tensor(). Unfortunately, It’s being removed as part of eliminating redundant code.

I wanted to handle this issue in tensors. since passes which Tiles the linalg generic or fuses the linalg generic is happening in tensors. And handling it in memrefs would miss these passes.

thanks,

I don’t think you can guarantee this, even if you use things like tensor.empty() or memref.alloc(). Ultimately, the compiler is free to optimize memory usage as it sees fit and that’s a good thing.

Maybe you’re thinking of this as a user, not a compiler, and as such need to match your expectations accordingly. Or maybe your operations aren’t expressive enough and you’re misleading the compiler.

IF the buffer has different user chains, then the compiler cannot fuse them together and your expectation is held. But if the buffer only has a single chain of users (and the operation allows in-place semantics), then the compiler is free to remove the allocation. This is similar to register reuse in allocators: add r0, r0, r1 is perfectly valid if the previous value stored in r0 is dead after the add op.

If your code has some extra semantics that isn’t being propagated through the compiler (ex. volatile), then you need to add side effects to your ops to make sure the compiler can’t assume anything about the memory representation, and thus won’t try to eliminate buffers.

1 Like

Thanks, appreciate your detailed answer.