I’m playing around with the linalg passes (fusion in particular). Right now, i have a sequence of two linalg.generic
operations:
#map = affine_map<(d0, d1) -> (d0, d1)>
func.func @body1(%arg0: tensor<100x100xf64>, %arg1: tensor<100x100xf64>, %arg2: tensor<100x100xf64>, %arg3: tensor<100x100xf64>, %arg4: tensor<100x100xf64>) -> tensor<100x100xf64> attributes {llvm.emit_c_interface} {
%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<100x100xf64>, tensor<100x100xf64>) outs(%arg2 : tensor<100x100xf64>) {
^bb0(%in: f64, %in_0: f64, %out: f64):
%0 = arith.addf %in, %in_0 : f64
linalg.yield %0 : f64
} -> (tensor<100x100xf64>)
%2 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%1, %arg3 : tensor<100x100xf64>, tensor<100x100xf64>) outs(%arg4 : tensor<100x100xf64>) {
^bb0(%in: f64, %in_0: f64, %out: f64):
%0 = arith.mulf %in, %in_0 : f64
linalg.yield %0 : f64
} -> (tensor<100x100xf64>)
return %2 : tensor<100x100xf64>
}
The generic operations have a producer-consumer relationship, and the linalg fusion passes are able to fuse them. However, I’m planning on using this dialect in a setting where I the tensor value-semantics are not applicable. The function I want to write looks something like:
func.func @body2(%arg0: memref<100x100xf64>, %arg1: memref<100x100xf64>, %arg2: memref<100x100xf64>, %arg3: memref<100x100xf64>) attributes {llvm.emit_c_interface} {
linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : memref<100x100xf64>, memref<100x100xf64>) outs(%arg2 : memref<100x100xf64>) {
^bb0(%in: f64, %in_0: f64, %out: f64):
%0 = arith.addf %in, %in_0 : f64
linalg.yield %0 : f64
}
linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : memref<100x100xf64>, memref<100x100xf64>) outs(%arg2: memref<100x100xf64>) {
^bb0(%in: f64, %in_0: f64, %out: f64):
%0 = arith.mulf %in, %in_0 : f64
linalg.yield %0 : f64
}
return
}
where there is still a producer-consumer relationship between the generic operations, but I do need the operation to accept memrefs and eventually write into the output %arg2
(imagine that these allocations are performed outside of my control).
So are there plans to extend the linalg passes to doing any analysis on “bufferized” arguments, rather than tensors? Thanks