I’m planning to add linalg::IndexOnlyGenericOp, as described below. The difference with the existed linalg::GenericOp and linalg::IndexedGenericOp is that, the body function only accepts indexes and memrefs as parameters, and the user will handle the load/store in the body function according to the actual needs.
We have to use linalg::IndexOnlyGenericOp instead of GenericOp/IndexedGenericOp when:
1, When the user has to manage the load/store. For example, when an atomic instruction is the body function, the parameter has to be pointers other than loaded data, and no store is needed.
2, When the user want to explicitly optimize the memory load/store, for example, vectorized load.
Still take the matmul as an example:
func @fma(%offset_m: index, %offset_n: index, %offset_k: index,
%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>)
-> ()
{
// the user manage the load/store according to actual needs.
// the user may choose not to load/store if there's atomic instructions here.
%a = linalg.load %A[%m, %k] : memref<?x?xf32, stride_specification>
%b = linalg.load %B[%k, %n] : memref<?x?xf32, stride_specification>
%c = linalg.load %C[%m, %n] : memref<?x?xf32, stride_specification>
%d = mulf %a, %b: f32
%e = addf %c, %d: f32
linalg.store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
}
#matmul_accesses = [
(m, n, k) -> (m, k),
(m, n, k) -> (k, n),
(m, n, k) -> (m, n)
]
#matmul_trait = {
doc = "C(m, n) += A(m, k) * B(k, n)",
fun = @fma,
indexing_maps = #matmul_accesses,
library_call = "linalg_matmul",
n_views = [2, 1],
iterator_types = ["parallel", "parallel", "reduction"]
}