[RFC] Linalg (Indexed)GenericOp Unification

[RFC] Linalg (Indexed)GenericOp Unification

Motivation

The linalg.indexed_generic is a variant of the linalg.generic operation that has an additional set of block arguments to pass the iteration indices. Apart from the additional index block arguments the two operations do not differ. The following observations lead us rethink this design:

  • named/structured operations cannot access the iteration indices
  • passes need to handle two operation types despite their similarity
  • certain functionality such as vectorization is not available for both operations

We thus suggest to unify the operations and have only one linalg.generic operation. All named or generic operations depending on the iteration indexes may then rely on the linalg.index operation (https://reviews.llvm.org/D100292) to access the iteration indexes. The operation can be placed in the body of every structured operation and already supports vectorization (https://reviews.llvm.org/D100373). Unifying the linalg.indexed_generic and the linalg.generic operation thus separates the index access logic from the enclosing linalg operation, simplifies transformations, and adds vectorization support.

Overview

The linalg.indexed_generic passes the iteration indices using block arguments:

linalg.indexed_generic #traits
   ins(%operand : memref<?x?xindex>)
  outs(%result : memref<?x?xindex>) {
  ^bb0(%i: index, %j: index, %in: index, %out: index):
    linalg.yield %i : f32
}

In this example we return the first iteration index using the linalg.yield operation. Written using the linalg.generic and linalg.index operations the code translates to:

linalg.generic #traits
   ins(%operand : memref<?x?xindex>)
  outs(%result : memref<?x?xindex>) {
  ^bb0(%in: index, %out: index):
    %i = linalg.index 0 : index
    linalg.yield %i : f32
}

The linalg.index operation again returns the first iteration index removing any need for additional block arguments or the linalg.indexed_generic operation.

Example Use Case

An important use case for the unification are structured operations requiring iteration index access. An example is the currently developed linalg.fill_rng operation (a two-dimensional version of it can be found in the phabricator revision https://reviews.llvm.org/D101364). It initializes an output tensor/memref with random values using the iteration indices as seed for the random value computation.

All structured operations have linalg.generic semantics and no iteration index access. Having the linalg.index operation as a consequence of the unification is thus a requirement for the implementation of the linalg.fill_rng operation:

linalg.fill_rng(%buf, %min, %max, %seed) : memref<2x4xf32>, f64, f64, i32

The operation fills the output %buf with random numbers limited to the range [%min, %max] using the seed %seed. The region builder of the operation emits the linalg.index operation to access the iteration indices:

^bb0(%arg1: f32):  // no predecessors
  %0 = linalg.index 0 : index
  %1 = index_cast %0 : index to i32
  %2 = addi %1, %c42_i32 : i32
  %3 = muli %2, %c1103515245_i32 : i32
  %4 = addi %3, %c12345_i32 : i32
  %5 = linalg.index 1 : index
  %6 = index_cast %5 : index to i32
  %7 = addi %6, %4 : i32
  %8 = muli %7, %c1103515245_i32 : i32
  %9 = addi %8, %c12345_i32 : i32
  %10 = uitofp %9 : i32 to f64
  %11 = mulf %10, %cst_0 : f64
  %12 = addf %cst, %11 : f64
  %13 = fptrunc %12 : f64 to f32
  linalg.yield %13 : f32

This region can be immediately used to generalize linalg.fill_rng to its generic form by inlining the region in the body of a linalg.generic operation.

Migration Plan

Over the last weeks we already pushed a number of patches adding support for the linalg.index operation:

We plan to conclude the refactoring by splitting the unification of the the linalg.indexed_generic and the linalg.generic operation into two steps:

  1. Introduce a canonicalization pattern or a separate pass to transform all linalg.indexed_generic operations into linalg.generic + linalg.index operations. Additionally, we will remove linalg.indexed_generic support from all code transformations.
  2. Once all users finish transitioning from linalg.indexed_generic to linalg.generic + linalg.index, we will completely remove the linalg.indexed_generic operation including the canonicalization pattern introduced in step 1.

This approach requires only minimal external changes during phase 1 (possibly calling canonicalize) and provides some time for the transition.

2 Likes

Thanks for pushing this @gysit ! This will greatly simplify some of the templated parts of the code and pave the way for more expressive ops that transform and vectorize without surprises.