This is mainly because I do not want the caller (in my case written in C and using some MLIR wrappers for memrefs) to assume that this function has a return value (which could cost a few cycles to store/retrieve register values).
The issue I have is that as soon as I write it like this, then canonicalization/cse gets rid of the entire body of the function (since %sxm is now unused), reducing it to the single “return”. Is there a way (through attributes or other) to prevent this?
I would encourage that you consider using different ops or types. Right now you are describing different semantics than what your function is describing. Your function is describing creation of a new tensor without modifying the input while your hope is to modify the input in place. Using an attribute to change the semantics of the op is less desirable than using memrefs instead of tensors, so your function is describing the behavior that you want.
Tensors are immutable objects, so it just can’t be sound to try to do what you’re describing. What would the caller looks like? How to reason about the use of the tensor there? @tpopp gave some good advices above.
This is a question of lowering and ABI: the fact that there is a tensor returned here does not mean that the lowering can’t be through an output parameter post-bufferization for example.
The tensor abstraction level is just not the place where you can express this.