`linalg.index` as affine symbols

You may remember this discussion a while ago about whether the indices in a linalg.indexed_generic should be usable as affine symbols. The overall sentiment was very in favor of that, but obviously the unification with linalg.generic happened and the indices were no longer block arguments.

Right now, the indices can be obtained using linalg.index, but not used in affine expressions. Consider this example of me trying to be terribly clever to to fuse a ReLU into a convolution using the guaranteed order of the reduction (at this point):

func.func @test(%in : tensor<12x12xf32>, %w : tensor<3x3xf32>) -> tensor<10x10xf32> {
    %cst0 = arith.constant 0.0 : f32
    %init = tensor.splat %cst0 : tensor<10x10xf32>
    %out = linalg.generic {
            indexing_maps = [
                affine_map<(x,y,i,j) -> (x+i,y+j)>,
                affine_map<(x,y,i,j) -> (i,j)>,
                affine_map<(x,y,i,j) -> (x,y)>
            ],
            iterator_types = [
                "parallel", "parallel", "reduction", "reduction"
            ]
        }
        ins(%in, %w : tensor<12x12xf32>, tensor<3x3xf32>)
        outs(%init : tensor<10x10xf32>) {
        ^bb0(%arg0 : f32, %arg1 : f32, %arg2 : f32):
            %0 = arith.mulf %arg0, %arg1 : f32
            %1 = arith.addf %arg2, %0 : f32
            %i = linalg.index 2 : index
            %j = linalg.index 3 : index
            %2 = affine.if affine_set<(a,b) : (2-a==0,2-b==0)>(%i, %j) -> (f32) {
                %3 = arith.maxf %1, %cst0 : f32
                affine.yield %3 : f32
            } else {
                affine.yield %1 : f32
            }
            linalg.yield %2 : f32
    } -> tensor<10x10xf32>
    return %out : tensor<10x10xf32>
}

This does not work for the above reasons. However, the following is valid:

func.func @test(%in : tensor<12x12xf32>, %w : tensor<3x3xf32>) -> tensor<10x10xf32> {
    %cst0 = arith.constant 0.0 : f32
    %cst2 = arith.constant 2 : index
    %init = tensor.splat %cst0 : tensor<10x10xf32>
    %out = linalg.generic {
            indexing_maps = [
                affine_map<(x,y,i,j) -> (x+i,y+j)>,
                affine_map<(x,y,i,j) -> (i,j)>,
                affine_map<(x,y,i,j) -> (x,y)>
            ],
            iterator_types = [
                "parallel", "parallel", "reduction", "reduction"
            ]
        }
        ins(%in, %w : tensor<12x12xf32>, tensor<3x3xf32>)
        outs(%init : tensor<10x10xf32>) {
        ^bb0(%arg0 : f32, %arg1 : f32, %arg2 : f32):
            %0 = arith.mulf %arg0, %arg1 : f32
            %1 = arith.addf %arg2, %0 : f32
            %i = linalg.index 2 : index
            %j = linalg.index 3 : index
            %cmpi = arith.cmpi "eq", %i, %cst2 : index
            %cmpj = arith.cmpi "eq", %j, %cst2 : index
            %cmp = arith.andi %cmpi, %cmpj : i1
            %2 = scf.if %cmp -> (f32) {
                %3 = arith.maxf %1, %cst0 : f32
                scf.yield %3 : f32
            } else {
                scf.yield %1 : f32
            }
            linalg.yield %2 : f32
    } -> tensor<10x10xf32>
    return %out : tensor<10x10xf32>
}

In the backend I’m living in, pulling conditionals into loop nests is a good idea, but I’m not arguing that this is the proper way to do this. My point is rather on the fact that, sadly, I can’t do this before going to affine, and I don’t immediately see a reason why.

What are your thoughts? Doesn’t the current design of affine values disallow them coming from non-affine ops altogether, making this impossible in the first place?

@nicolasvasilache