Tensor load or vector load for mlir-vulkan-runner

tissue3 · June 24, 2021, 8:44pm

I am recently trying to run MLIR on gpu, starting with mlir-vulkan-runner examples: llvm-project/mulf.mlir at 929189a4995ece3162adced7a7d9be8e17dc4079 · llvm/llvm-project · GitHub

module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
    gpu.func @kernel_mul(%arg0 : memref<4x4xf32>, %arg1 : memref<4x4xf32>, %arg2 : memref<4x4xf32>)
      kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32> }} {
      %x = "gpu.block_id"() {dimension = "x"} : () -> index
      %y = "gpu.block_id"() {dimension = "y"} : () -> index
      %1 = memref.load %arg0[%x, %y] : memref<4x4xf32>
      %2 = memref.load %arg1[%x, %y] : memref<4x4xf32>
      %3 = mulf %1, %2 : f32
      memref.store %3, %arg2[%x, %y] : memref<4x4xf32>
      gpu.return
    }
  }

  func @main() {
    ...
    gpu.launch_func @kernels::@kernel_mul
        blocks in (%cst4, %cst4, %cst1) threads in(%cst1, %cst1, %cst1)
        args(%arg0 : memref<4x4xf32>, %arg1 : memref<4x4xf32>, %arg2 : memref<4x4xf32>)
    ...
  }
  ...
}

One question occurs to me that memref is represented as !spv.ptr<!spv.array<nelts x elem_type>> , but it seems spv dialect does not support computation on array. To perform computation, one has to load the exact scalar and do element-wise computation.

However, spv dialect is strong at vector computation, e,g, . spv.IAdd. So I wonder to perform efficient computation using spv dialect, why not just represent memref as something like !spv.ptr<!spv.vector<nelts x elem_type>>? Or to use powerful spv computation like spv.IAdd on vector, and spv.CooperativeMatrixMulAddNV on matrix, is there a way to do convert memref to vector with vector::load or some sort of tensor load, but not with an element-wise load and store?

ThomasRaoux · June 29, 2021, 3:52pm

The conversion to SPIR-V makes as few decision as possible to keep the complexity low. The choice of using vector or cooperativeMatrix should be done earlier. It is already possible to generate vector load and vector spv.IAdd. If you change the memref from memref<4x4xf32> to memref<4xvector<4xf32>> the it will be converted to !spv.ptr<! n x spv.vector> and the vector associated with be vector load.
In the same way std.add will be converted to a vector spv.IAdd if the type is a vector 2 or vector 4 (since SPIR-V only support up to vector 4)

We do want to generate vector load/store and operations in general but the vectorization should be done at a higher level. Note that there are several path to generate vector operations in MLIR, like linalg vectorization or supervectorize.

If you want some concrete example IREE uses linalg vectorization to generate vector SPIRV code. Here a test going through that path for instance:

github.com

iree-org/iree/blob/f2e17c53d10c4e75e5d02f00e10cee7540187a13/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_vectorization.mlir#L64


      
                }
              }
            }
          }
          
          
//    CHECK-LABEL: spv.func @fuse_and_vectorize_fill_matmul
          //      CHECK-NOT:   spv.Store "StorageBuffer"
          //      CHECK-NOT:   spv.Load "StorageBuffer"
          //          CHECK:   spv.mlir.loop
          // CHECK-COUNT-12:   spv.Load "StorageBuffer" %{{.*}} : vector<4xf32>
          // CHECK-COUNT-32:   spv.GLSL.Fma %{{.*}}, %{{.*}} : vector<4xf32>
          //  CHECK-COUNT-8:   spv.Store "StorageBuffer" %{{.*}}, %{{.*}} : vector<4xf32>
          
          
// -----
          
          
hal.executable @fuse_and_vectorize_matmul_add attributes {sym_visibility = "private"} {
            hal.interface @io {
              hal.interface.binding @s0b0_ro_external, set=0, binding=0, type="StorageBuffer", access="Read"
              hal.interface.binding @s0b1_ro_external, set=0, binding=1, type="StorageBuffer", access="Read"
              hal.interface.binding @s0b2_xw_external, set=0, binding=2, type="StorageBuffer", access="Write|Discard"
            }

ThomasRaoux · June 29, 2021, 4:01pm

To give a more concrete answer you can easily change your input IR to generate the vector code:

module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
    #spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
    gpu.func @kernel_mul(%arg0 : memref<4xvector<4xf32>>, %arg1 : memref<4xvector<4xf32>>, %arg2 : memref<4xvector<4xf32>>)
      kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32> }} {
      %x = "gpu.block_id"() {dimension = "x"} : () -> index
      %y = "gpu.block_id"() {dimension = "y"} : () -> index
      %1 = memref.load %arg0[%x] : memref<4xvector<4xf32>>
      %2 = memref.load %arg1[%x] : memref<4xvector<4xf32>>
      %3 = mulf %1, %2 : vector<4xf32>
      memref.store %3, %arg2[%x] : memref<4xvector<4xf32>>
      gpu.return
    }
  }

  func @main() {
    ...
    gpu.launch_func @kernels::@kernel_mul
        blocks in (%cst4, %cst4, %cst1) threads in(%cst1, %cst1, %cst1)
        args(%arg0 : memref<4xvector<4xf32>>, %arg1 : memref<4xvector<4xf32>>, %arg2 : memref<4xvector<4xf32>>)
    ...
  }
  ...
}

Topic		Replies	Views
Load issues from memref<3xi1> to vector<3xi1> and data layout MLIR	5	796	January 27, 2021
Error at lower gpu dialect to llvmir MLIR	6	516	March 26, 2022
Run linalg.matmul on gpu MLIR gpu , llvm , mlir	2	228	April 18, 2024
Making linalg.matmul to GPU runnable code MLIR	6	1341	April 19, 2022
Help lowering GPU modules to LLVM MLIR	11	740	August 24, 2023

Tensor load or vector load for mlir-vulkan-runner

Related Topics