I think comprehensive bufferization does the trick. However, you need to annotate the output argument as inplacable:
func @test(%a : tensor<64x64xi64>, %b : tensor<64x64xi64>, %c : tensor<64x64xi64> {linalg.inplaceable = true}) -> tensor<64x64xi64> {
%res = linalg.generic {doc = "C(a, b) = op(A(a, c), B(a, c))", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]}
ins(%a, %b : tensor<64x64xi64>, tensor<64x64xi64>)
outs(%c: tensor<64x64xi64>)
{
^bb0(%aa: i64, %bb: i64, %cc: i64) :
%ee = addi %aa, %bb: i64
linalg.yield %ee : i64
} -> tensor<64x64xi64>
return %res : tensor<64x64xi64>
}
then calling comprehensive bufferize (mlir-opt bufferize.mlir --linalg-comprehensive-module-bufferize) yields the following output:
func @test(%arg0: memref<64x64xi64, #map0>, %arg1: memref<64x64xi64, #map0>, %arg2: memref<64x64xi64, #map0>) {
linalg.generic {doc = "C(a, b) = op(A(a, c), B(a, c))", indexing_maps = [#map1, #map1, #map1], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : memref<64x64xi64, #map0>, memref<64x64xi64, #map0>) outs(%arg2 : memref<64x64xi64, #map0>) {
^bb0(%arg3: i64, %arg4: i64, %arg5: i64): // no predecessors
%0 = addi %arg3, %arg4 : i64
linalg.yield %0 : i64
}
return
}
Now the output is written to %c.
PS there are different bufferization implementations and it may also be possible to extend your set of flags to get to a similar result.