Preventing CSE from clearing the body of a function with no return value

Hi all,

I have a question related to CSE and in-place update of containers:

Let’s say I have a function like this:

  func.func @sxm(%s: f32, %M: tensor<?x?xf32> {bufferization.writable = true}) -> tensor<?x?xf32>  {
    %sxm = linalg.generic {indexing_maps = [#mapS, #mapM], iterator_types = ["parallel", "parallel"]} ins(%s : f32) outs(%M : tensor<?x?xf32>) {
    ^bb0(%in: f32, %out: f32):
      %sm = arith.mulf %in, %out : f32
      linalg.yield %sm : f32
    } -> tensor<?x?xf32>
    return %sxm : tensor<?x?xf32>
  }

But I don’t want it to return a value, since I know that it should update its parameter %M in place. I thus would like to write it as

 func.func @sxm(%s: f32, %M: tensor<?x?xf32> {bufferization.writable = true}) {
    %sxm = linalg.generic {indexing_maps = [#mapS, #mapM], iterator_types = ["parallel", "parallel"]} ins(%s : f32) outs(%M : tensor<?x?xf32>) {
    ^bb0(%in: f32, %out: f32):
      %sm = arith.mulf %in, %out : f32
      linalg.yield %sm : f32
    } -> tensor<?x?xf32>
    return
  }

This is mainly because I do not want the caller (in my case written in C and using some MLIR wrappers for memrefs) to assume that this function has a return value (which could cost a few cycles to store/retrieve register values).

The issue I have is that as soon as I write it like this, then canonicalization/cse gets rid of the entire body of the function (since %sxm is now unused), reducing it to the single “return”. Is there a way (through attributes or other) to prevent this?

Thanks

We had a similar problem in a local dialect and we created a sink op that does nothing, but keeps the optimizer from cleaning it up.

Not sure how you could do that with upstream dialects. Maybe there’s an attribute somewhere?

I would encourage that you consider using different ops or types. Right now you are describing different semantics than what your function is describing. Your function is describing creation of a new tensor without modifying the input while your hope is to modify the input in place. Using an attribute to change the semantics of the op is less desirable than using memrefs instead of tensors, so your function is describing the behavior that you want.

1 Like

Tensors are immutable objects, so it just can’t be sound to try to do what you’re describing. What would the caller looks like? How to reason about the use of the tensor there?
@tpopp gave some good advices above.

This is a question of lowering and ABI: the fact that there is a tensor returned here does not mean that the lowering can’t be through an output parameter post-bufferization for example.
The tensor abstraction level is just not the place where you can express this.