Window iterator type for Linalg GenericOp?

I’m trying to translate a specification of a stencil kernel into MLIR and linalg.generic_op seems like a good candidate. I’m starting with a function which operates on a fixed-size window and (aside from the windowing) structurally maps very nicely to linalg.generic_op

The description of GenericOp indicates that this may be supported, particularly this bullet point:

- iterator_types: an ArrayAttr specifying the type of the enclosing loops.
  Each element of the list represents and iterator of one of the following
  types:
    parallel, reduction, window

The window attribute seems to have no effect when I try to create/lower the GenericOp. Perhaps I’m missing some key piece here as I see no way to specify the size of the window.

So, is this something that is expected to work or planned for future support?

Good question. @nicolasvasilache might have more to comment, but practically, window is treated same as reduction, i.e. that particular loop has a dependence. In other words it is to say its not parallel. AFAIK, nowhere is there a distinction made between reduction and window

The size of the window is obtained from the shape of the arguments. Effectively what you need for a “window” loop (or any loop in linalg op) is something that gives you the extent of the loop. In Linalg all loop bounds are computed from the shape of the operands using the information from the indexing maps. So for example, in convolution-like ops, the shape of the window comes from the shape of the filter operand. For pooling operations, a fake operand is added that is used to represent the shape of the window (but its values are not used).

I can help you specify your operation if you can describe the computation you are looking to represent.

@MaheshRavishankar, thanks for the quick reply. That makes sense from a dependence standpoint. Perhaps my intuition of what the window attribute entails is off now that I think about it. I assumed it implies accessing a sliding window of values along some dimension.

Right now I’m trying to start with something simple like a 1d-blur function, say:

A(i,j) = (B(i-1,j) + B(i,j) + B(i+1,j)) / 3

But I would like to eventually extend that to at least 2-d windows. For my use cases, the dimensions of the window will always be known statically, which should make things easier.

That is right. It is a sliding window along that dimension, but in practice it is used to say “there is some dependence along this dimension”. It is meant to capture that this dependence is not a general reduction-like dependence, but more of a sliding-window like dependence, but there is no place this is actually used.

Ah, I see why this is confusing. The trick here is that you need the operation to take 3 input arguments.

%c3 = constant 3.0 : f32
%A = linalg.generic
    {indexing_maps = [affine_map<(d0, d1) -> (d0-1, d1)>,
                      affine_map<(d0, d1) -> (d0, d1)>,
                      affine_map<(d0, d1) -> (d0+1, d1)>],
     iterator_types = ["window", "parallel"]
     ins(%B, %B, %B : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>)
     outs(...)
     ^bb0(%arg0 : f32, %arg1 : f32, %arg2 : f32, %arg3 :f32):
       %0 = addf %arg0, %arg1 : f32
       %1 = addf %0, %arg2: f32
       %2 = divf %1, %c3
       linalg.yield %2 : f32
    } -> tensor<?x?xf32>

Essentially you need the indexing maps to encode each access to the operand, and here there are three. So you just make the operation take three different arguments, one for each indexing map. This is purely representational. Finally they will all read from the same memory (i.e. the memory for %B).

@MaheshRavishankar, thanks for the example code! That makes a lot of sense.

It was unclear to me initially that passing in the same matrix/tensor multiple times was the intended structure. My intuition was that the region would receive a memref or tensor containing the current window, but that probably just makes optimization more difficult.