Plans to extend linalg optimization passes to memref operands

I’m playing around with the linalg passes (fusion in particular). Right now, i have a sequence of two linalg.generic operations:

#map = affine_map<(d0, d1) -> (d0, d1)>
func.func @body1(%arg0: tensor<100x100xf64>, %arg1: tensor<100x100xf64>, %arg2: tensor<100x100xf64>, %arg3: tensor<100x100xf64>, %arg4: tensor<100x100xf64>) -> tensor<100x100xf64> attributes {llvm.emit_c_interface} {
  %1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<100x100xf64>, tensor<100x100xf64>) outs(%arg2 : tensor<100x100xf64>) {
  ^bb0(%in: f64, %in_0: f64, %out: f64):
    %0 = arith.addf %in, %in_0 : f64
    linalg.yield %0 : f64
  } -> (tensor<100x100xf64>)
  %2 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%1, %arg3 : tensor<100x100xf64>, tensor<100x100xf64>) outs(%arg4 : tensor<100x100xf64>) {
  ^bb0(%in: f64, %in_0: f64, %out: f64):
    %0 = arith.mulf %in, %in_0 : f64
    linalg.yield %0 : f64
  } -> (tensor<100x100xf64>)
  return %2 : tensor<100x100xf64>
}

The generic operations have a producer-consumer relationship, and the linalg fusion passes are able to fuse them. However, I’m planning on using this dialect in a setting where I the tensor value-semantics are not applicable. The function I want to write looks something like:

func.func @body2(%arg0: memref<100x100xf64>, %arg1: memref<100x100xf64>, %arg2: memref<100x100xf64>, %arg3: memref<100x100xf64>) attributes {llvm.emit_c_interface} {
  linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : memref<100x100xf64>, memref<100x100xf64>) outs(%arg2 : memref<100x100xf64>) {
  ^bb0(%in: f64, %in_0: f64, %out: f64):
    %0 = arith.addf %in, %in_0 : f64
    linalg.yield %0 : f64
  }
  linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : memref<100x100xf64>, memref<100x100xf64>) outs(%arg2: memref<100x100xf64>) {
  ^bb0(%in: f64, %in_0: f64, %out: f64):
    %0 = arith.mulf %in, %in_0 : f64
    linalg.yield %0 : f64
  }
  return
}

where there is still a producer-consumer relationship between the generic operations, but I do need the operation to accept memrefs and eventually write into the output %arg2 (imagine that these allocations are performed outside of my control).

So are there plans to extend the linalg passes to doing any analysis on “bufferized” arguments, rather than tensors? Thanks

Fusion on operations with memref semantics is really involved since it is hard to track dependencies without having explicit SSA use-def chains. One way here might be to use fusion on tensors and then use OneShotBufferization to convert the linalg operation on tensors into linalg operations on memrefs .

This is subtle, so I’m sure i’m not getting the full details here. It seems like use-def chains are visible from the linalg generic ops even on memrefs with the in and out arguments to the operation right?

Thats not a use-def chain. They are all just uses. See the difference between the tensors version and memrefs version. In the tensors version, one operation returns a result (i.e def) that is used in the next operation reads (i.e use). That is explicit in the IR. For memrefs you need to look at all uses and look for all possible uses and side effects when doing the fusion.

2 Likes

And even so: there is also the question of aliasing (for example 2 different SSA values can be memref subviews from the same allocation).

1 Like

Yes, these concerns make sense. It does seem to me that if I did have aliasing information (i.e. restrict) on each of the input memrefs, it wouldn’t be that much harder than the tensor case. I’ll look into developing the transformation for my own use case.