Tensor load or vector load for mlir-vulkan-runner

I am recently trying to run MLIR on gpu, starting with mlir-vulkan-runner examples: llvm-project/mulf.mlir at 929189a4995ece3162adced7a7d9be8e17dc4079 · llvm/llvm-project · GitHub

module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
    gpu.func @kernel_mul(%arg0 : memref<4x4xf32>, %arg1 : memref<4x4xf32>, %arg2 : memref<4x4xf32>)
      kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32> }} {
      %x = "gpu.block_id"() {dimension = "x"} : () -> index
      %y = "gpu.block_id"() {dimension = "y"} : () -> index
      %1 = memref.load %arg0[%x, %y] : memref<4x4xf32>
      %2 = memref.load %arg1[%x, %y] : memref<4x4xf32>
      %3 = mulf %1, %2 : f32
      memref.store %3, %arg2[%x, %y] : memref<4x4xf32>
      gpu.return
    }
  }

  func @main() {
    ...
    gpu.launch_func @kernels::@kernel_mul
        blocks in (%cst4, %cst4, %cst1) threads in(%cst1, %cst1, %cst1)
        args(%arg0 : memref<4x4xf32>, %arg1 : memref<4x4xf32>, %arg2 : memref<4x4xf32>)
    ...
  }
  ...
}

One question occurs to me that memref is represented as !spv.ptr<!spv.array<nelts x elem_type>> , but it seems spv dialect does not support computation on array. To perform computation, one has to load the exact scalar and do element-wise computation.

However, spv dialect is strong at vector computation, e,g, . spv.IAdd. So I wonder to perform efficient computation using spv dialect, why not just represent memref as something like !spv.ptr<!spv.vector<nelts x elem_type>>? Or to use powerful spv computation like spv.IAdd on vector, and spv.CooperativeMatrixMulAddNV on matrix, is there a way to do convert memref to vector with vector::load or some sort of tensor load, but not with an element-wise load and store?

The conversion to SPIR-V makes as few decision as possible to keep the complexity low. The choice of using vector or cooperativeMatrix should be done earlier. It is already possible to generate vector load and vector spv.IAdd. If you change the memref from memref<4x4xf32> to memref<4xvector<4xf32>> the it will be converted to !spv.ptr<! n x spv.vector> and the vector associated with be vector load.
In the same way std.add will be converted to a vector spv.IAdd if the type is a vector 2 or vector 4 (since SPIR-V only support up to vector 4)

We do want to generate vector load/store and operations in general but the vectorization should be done at a higher level. Note that there are several path to generate vector operations in MLIR, like linalg vectorization or supervectorize.

If you want some concrete example IREE uses linalg vectorization to generate vector SPIRV code. Here a test going through that path for instance:

To give a more concrete answer you can easily change your input IR to generate the vector code:

module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
    gpu.func @kernel_mul(%arg0 : memref<4xvector<4xf32>>, %arg1 : memref<4xvector<4xf32>>, %arg2 : memref<4xvector<4xf32>>)
      kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32> }} {
      %x = "gpu.block_id"() {dimension = "x"} : () -> index
      %y = "gpu.block_id"() {dimension = "y"} : () -> index
      %1 = memref.load %arg0[%x] : memref<4xvector<4xf32>>
      %2 = memref.load %arg1[%x] : memref<4xvector<4xf32>>
      %3 = mulf %1, %2 : vector<4xf32>
      memref.store %3, %arg2[%x] : memref<4xvector<4xf32>>
      gpu.return
    }
  }

  func @main() {
    ...
    gpu.launch_func @kernels::@kernel_mul
        blocks in (%cst4, %cst4, %cst1) threads in(%cst1, %cst1, %cst1)
        args(%arg0 : memref<4xvector<4xf32>>, %arg1 : memref<4xvector<4xf32>>, %arg2 : memref<4xvector<4xf32>>)
    ...
  }
  ...
}
1 Like