2D strided memref & vector_transfer

Hi,

During ConvertVectorToLLVMPass vector_transfer operations with strided memrefs are not lowered because of this commit: ⚙ D86951 [mlir][VectorOps] Fail fast when a strided memref is passed to vector_transfer

But for 2D strided memrefs vector_transfer operations are lowered successfully (for (int index = 0, e = strides.size() - 2; index < e; ++index) loop body doesn’t not executed). Do I understand correctly that this is a bug and vector_transfer with 2D strided memrefs shouldn’t be lowered earthier?

@matthias-springer has been doing a lot of nice improvements recently, he’s the most up to speed atm.

I think the “non-static 1” most minor stride is not yet supported and we were discussing using matrix.column_load/store, but I’ll let him confirm.

1 Like

2D vector transfer ops should not be lowered by ConvertVectorToLLVM, regardless of strides etc. Can you share an example where this is happening?

Basically, ConvertVectorToLLVM handles only 1D cases at the moment. For (N>1)-D cases, you can run VectorToSCF first, which lowers such vector transfer ops to 1D ops. VectorToSCF also supports 1D transfer ops with non-unit stride (which ConvertVectorToLLVM bails on).

Note: As Nicolas said, we’re not generating matrix.colum_load/stores yet. The 1D strided lowering pattern in VectorToSCF generates a loop with scalar load/stores.

1 Like

Thank you for respond :raised_hands:
I mean not target vector dimensions but source memref dimensions.

In these examples I transfer_read/transfer_write floats one by one from one 2x2 memref to another. I am lowering these examples with mlir-opt --convert-vector-to-llvm command.

Note:
Forgot to say, that I faced this issue when I tried to apply CodegenStrategy (tiling+promotion+vectorization) to linalg::MatmulOp (2D case) and linalg::BatchMatmulOp (3D case).

2D case:

#map0 = affine_map<(d0, d1)[s0] -> (d0 * 2 + s0 + d1)>
module {
  func @transfer_read_2d(%arg0: memref<2x2xf32>, %arg1: memref<2x2xf32>) {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %c2 = constant 2 : index
    %cst = constant 0.000000e+00 : f32
    scf.for %arg2 = %c0 to %c2 step %c1 {
      scf.for %arg3 = %c0 to %c2 step %c1 {
        %0 = memref.subview %arg0[%arg2, %arg3] [1, 1] [1, 1] : memref<2x2xf32> to memref<1x1xf32, #map0>
        %1 = vector.transfer_read %0[%c0, %c0], %cst {in_bounds = [true]} : memref<1x1xf32, #map0>, vector<1xf32>
        %2 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<2x2xf32> to memref<1x1xf32, #map0>
        vector.transfer_write %1, %2[%c0, %c0] {in_bounds = [true]} : vector<1xf32>, memref<1x1xf32, #map0>
      }
    }
    return
  }
}

3D case:

#map0 = affine_map<(d0, d1, d2)[s0] -> (d0 * 4 + s0 + d1 * 2 + d2)>
module {
  func @transfer_read_3d(%arg0: memref<1x2x2xf32>, %arg1: memref<1x2x2xf32>) {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %c2 = constant 2 : index
    %cst = constant 0.000000e+00 : f32
    scf.for %arg2 = %c0 to %c2 step %c1 {
      scf.for %arg3 = %c0 to %c2 step %c1 {
        %0 = memref.subview %arg0[0, %arg2, %arg3] [1, 1, 1] [1, 1, 1] : memref<1x2x2xf32> to memref<1x1x1xf32, #map0>
        %1 = vector.transfer_read %0[%c0, %c0, %c0], %cst {in_bounds = [true]} : memref<1x1x1xf32, #map0>, vector<1xf32>
        %2 = memref.subview %arg1[0, %arg2, %arg3] [1, 1, 1] [1, 1, 1] : memref<1x2x2xf32> to memref<1x1x1xf32, #map0>
        vector.transfer_write %1, %2[%c0, %c0, %c0] {in_bounds = [true]} : vector<1xf32>, memref<1x1x1xf32, #map0>
      }
    }
    return
  }
}

You’re right, the way ConvertVectorToLLVM is written, it should ignore all vector transfer ops that have non-unit strides. As it turns out, there was a bug in the function that checks for unit strides, so the 2D case was lowered but the 3D case was not. (None of them should have been.)

However, this check is also overly restrictive. 1D transfer ops can be lowered directly to LLVM loads/stores as long as the last memref dim (most minor) has unit stride. The strides of the other dimensions don’t matter.

I prepared a revision to fix this issue: ⚙ D102897 [mlir] Check only last dim stride in transfer op lowering
Thanks for bringing this to my attention.

1 Like

Thank you, this should fix my problem with CodegenStrategy :clap: