Error when lower affine to gpu (affine.load is not recognized)

Here is my mlir file for onnx.add. It has been lowered to affine dialect.

module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  func @main_graph(%arg0: memref<3x2xf32>, %arg1: memref<3x2xf32>) -> memref<3x2xf32> attributes {input_names = ["X1", "X2"], output_names = ["Y"]} {
    %0 = memref.alloc() {alignment = 16 : i64} : memref<3x2xf32>
    affine.for %arg2 = 0 to 3 {
      affine.for %arg3 = 0 to 2 {
        %1 = affine.load %arg0[%arg2, %arg3] : memref<3x2xf32>
        %2 = affine.load %arg1[%arg2, %arg3] : memref<3x2xf32>
        %3 = arith.addf %1, %2 : f32
        affine.store %3, %0[%arg2, %arg3] : memref<3x2xf32>
      }
    }
    return %0 : memref<3x2xf32>
  }
  "krnl.entry_point"() {func = @main_graph, numInputs = 2 : i32, numOutputs = 1 : i32, signature = "[    { \22type\22 : \22f32\22 , \22dims\22 : [3 , 2] , \22name\22 : \22X1\22 }\0A ,    { \22type\22 : \22f32\22 , \22dims\22 : [3 , 2] , \22name\22 : \22X2\22 }\0A\0A]\00@[   { \22type\22 : \22f32\22 , \22dims\22 : [3 , 2] , \22name\22 : \22Y\22 }\0A\0A]\00"} : () -> ()
}

I want to lower it to gpu dialect and I write the code below:

void addAffineToGPUPasses(mlir::PassManager &pm) {
  pm.addNestedPass<FuncOp>(mlir::createAffineForToGPUPass()); 

  // pm.addPass(mlir::createGpuKernelOutliningPass());
  // pm.addPass(mlir::createGpuToLLVMConversionPass());
}

But here is the error I get:

error: 'affine.load' op index must be a dimension or symbol identifier

Can anyone help me with it ?

1 Like

Affine memory operations expect their subscript operands to be defined in a specific way (see 'affine' Dialect - MLIR). Thread index values cannot be used as subscripts in affine operations, unlike affine.for induction variables, hence the error.

You need to convert affine memory operations to memref operations first, and then convert the loops separately. This may require more code than just setting up a pass pipeline. Alternatively, you can lower the affine operations to a mix of memref+scf, and then map that to GPU.

1 Like

Thank you very much. Iā€™m trying now.