[RFC] Linalg (Indexed)GenericOp Unification
Motivation
The linalg.indexed_generic
is a variant of the linalg.generic
operation that has an additional set of block arguments to pass the iteration indices. Apart from the additional index block arguments the two operations do not differ. The following observations lead us rethink this design:
- named/structured operations cannot access the iteration indices
- passes need to handle two operation types despite their similarity
- certain functionality such as vectorization is not available for both operations
We thus suggest to unify the operations and have only one linalg.generic
operation. All named or generic operations depending on the iteration indexes may then rely on the linalg.index
operation (https://reviews.llvm.org/D100292) to access the iteration indexes. The operation can be placed in the body of every structured operation and already supports vectorization (https://reviews.llvm.org/D100373). Unifying the linalg.indexed_generic
and the linalg.generic
operation thus separates the index access logic from the enclosing linalg operation, simplifies transformations, and adds vectorization support.
Overview
The linalg.indexed_generic
passes the iteration indices using block arguments:
linalg.indexed_generic #traits
ins(%operand : memref<?x?xindex>)
outs(%result : memref<?x?xindex>) {
^bb0(%i: index, %j: index, %in: index, %out: index):
linalg.yield %i : f32
}
In this example we return the first iteration index using the linalg.yield
operation. Written using the linalg.generic
and linalg.index
operations the code translates to:
linalg.generic #traits
ins(%operand : memref<?x?xindex>)
outs(%result : memref<?x?xindex>) {
^bb0(%in: index, %out: index):
%i = linalg.index 0 : index
linalg.yield %i : f32
}
The linalg.index
operation again returns the first iteration index removing any need for additional block arguments or the linalg.indexed_generic
operation.
Example Use Case
An important use case for the unification are structured operations requiring iteration index access. An example is the currently developed linalg.fill_rng
operation (a two-dimensional version of it can be found in the phabricator revision https://reviews.llvm.org/D101364). It initializes an output tensor/memref with random values using the iteration indices as seed for the random value computation.
All structured operations have linalg.generic
semantics and no iteration index access. Having the linalg.index
operation as a consequence of the unification is thus a requirement for the implementation of the linalg.fill_rng
operation:
linalg.fill_rng(%buf, %min, %max, %seed) : memref<2x4xf32>, f64, f64, i32
The operation fills the output %buf
with random numbers limited to the range [%min
, %max
] using the seed %seed
. The region builder of the operation emits the linalg.index
operation to access the iteration indices:
^bb0(%arg1: f32): // no predecessors
%0 = linalg.index 0 : index
%1 = index_cast %0 : index to i32
%2 = addi %1, %c42_i32 : i32
%3 = muli %2, %c1103515245_i32 : i32
%4 = addi %3, %c12345_i32 : i32
%5 = linalg.index 1 : index
%6 = index_cast %5 : index to i32
%7 = addi %6, %4 : i32
%8 = muli %7, %c1103515245_i32 : i32
%9 = addi %8, %c12345_i32 : i32
%10 = uitofp %9 : i32 to f64
%11 = mulf %10, %cst_0 : f64
%12 = addf %cst, %11 : f64
%13 = fptrunc %12 : f64 to f32
linalg.yield %13 : f32
This region can be immediately used to generalize linalg.fill_rng
to its generic form by inlining the region in the body of a linalg.generic
operation.
Migration Plan
Over the last weeks we already pushed a number of patches adding support for the linalg.index
operation:
-
https://reviews.llvm.org/D100292 added the
linalg.index
operation - https://reviews.llvm.org/D100364 adapt the linalg to loop lowering
- https://reviews.llvm.org/D100373 adapt the vectorization
- https://reviews.llvm.org/D100479 adapt fusion on tensors
- https://reviews.llvm.org/D100379 adapt tiling
- etc.
We plan to conclude the refactoring by splitting the unification of the the linalg.indexed_generic
and the linalg.generic
operation into two steps:
- Introduce a canonicalization pattern or a separate pass to transform all
linalg.indexed_generic
operations intolinalg.generic
+linalg.index
operations. Additionally, we will removelinalg.indexed_generic
support from all code transformations. - Once all users finish transitioning from
linalg.indexed_generic
tolinalg.generic
+linalg.index
, we will completely remove thelinalg.indexed_generic
operation including the canonicalization pattern introduced in step 1.
This approach requires only minimal external changes during phase 1 (possibly calling canonicalize) and provides some time for the transition.