Generate SPIRV binary from MLIR dialect kernels to run it on OCL runtime

Hello everyone,

I am recently working on a project to generate SPIRV binary from MLIR dialect kernels and make the SPIRV binary run on OCL runtime. OCL runtime provides a program builder for SPIRV binary called clCreateProgramWithIL. I am using the clCreateProgramWithIL() function to load the SPIRV binary and hoping to run it. But it failed. It seems that the clCreateProgramWithIL() function cannot recognize the format of SPIRV binary generated by mlir-opt lowering and mlir-vulkan-runner.

I also tried a different way to generate the SPIRV binary, not from MLIR dialects, but from .cl kernels. The generated SPIRV binary can successfully run on OCL runtime using the clCreateProgramWithIL() function.

When I compare the two SPIRV binary files, they look different. The MLIR one looks like this:
\03\02#\07\00\00\01\00\16\00\00\003\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\16\00\00\00kernel_addi\00\04\00\00\00\10\00\06\00\16\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin_var_WorkgroupId__\00\05\00\07\00\09\00\00\00kernel_addi_arg_0\00\00\00\05\00\07\00\0E\00\00\00kernel_addi_arg_1\00\00\00\05\00\07\00\13\00\00\00kernel_addi_arg_2\00\00\00\05\00\05\00\16\00\00\00kernel_addi\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\09\00\00\00!\00\00\00\00\00\00\00G\00\04\00\09\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\0B\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\0B\00\00\00\02\00\00\00G\00\04\00\0E\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0E\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\11\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\10\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\10\00\00\00\02\00\00\00G\00\04\00\13\00\00\00!\00\00\00\02\00\00\00G\00\04\00\13\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00+\00\04\00\03\00\00\00\08\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\03\00\00\00\08\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\09\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\0D\00\00\00@\00\00\00\1C\00\04\00\0C\00\00\00\03\00\00\00\0D\00\00\00\1E\00\03\00\0B\00\00\00\0C\00\00\00 \00\04\00\0A\00\00\00\0C\00\00\00\0B\00\00\00;\00\04\00\0A\00\00\00\0E\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\12\00\00\00\00\02\00\00\1C\00\04\00\11\00\00\00\03\00\00\00\12\00\00\00\1E\00\03\00\10\00\00\00\11\00\00\00 \00\04\00\0F\00\00\00\0C\00\00\00\10\00\00\00;\00\04\00\0F\00\00\00\13\00\00\00\0C\00\00\00\13\00\02\00\15\00\00\00!\00\03\00\14\00\00\00\15\00\00\00+\00\04\00\03\00\00\00\1E\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\1F\00\00\00\01\00\00\00 \00\04\00\22\00\00\00\0C\00\00\00\03\00\00\006\00\05\00\15\00\00\00\16\00\00\00\00\00\00\00\14\00\00\00\F8\00\02\00\17\00\00\00=\00\04\00\02\00\00\00\18\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\19\00\00\00\18\00\00\00\00\00\00\00=\00\04\00\02\00\00\00\1A\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1B\00\00\00\1A\00\00\00\01\00\00\00=\00\04\00\02\00\00\00\1C\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1D\00\00\00\1C\00\00\00\02\00\00\00\84\00\05\00\03\00\00\00 \00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00!\00\00\00\1E\00\00\00 \00\00\00A\00\06\00\22\00\00\00#\00\00\00\09\00\00\00\1E\00\00\00!\00\00\00=\00\04\00\03\00\00\00$\00\00\00#\00\00\00\84\00\05\00\03\00\00\00%\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00&\00\00\00\1E\00\00\00%\00\00\00\84\00\05\00\03\00\00\00'\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00(\00\00\00&\00\00\00'\00\00\00A\00\06\00\22\00\00\00)\00\00\00\0E\00\00\00\1E\00\00\00(\00\00\00=\00\04\00\03\00\00\00*\00\00\00)\00\00\00\80\00\05\00\03\00\00\00+\00\00\00$\00\00\00*\00\00\00\84\00\05\00\03\00\00\00,\00\00\00\0D\00\00\00\1D\00\00\00\80\00\05\00\03\00\00\00-\00\00\00\1E\00\00\00,\00\00\00\84\00\05\00\03\00\00\00.\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00/\00\00\00-\00\00\00.\00\00\00\84\00\05\00\03\00\00\000\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\001\00\00\00/\00\00\000\00\00\00A\00\06\00\22\00\00\002\00\00\00\13\00\00\00\1E\00\00\001\00\00\00>\00\03\002\00\00\00+\00\00\00\FD\00\01\008\00\01\00

While the .cl one looks like this:
^C^B#^G^@^@^A^@^N^@^F^@4^@^@^@^@^@^@^@^Q^@^B^@^D^@^@^@^Q^@^B^@^E^@^@^@^Q^@^B^@^F^@^@^@^Q^@^B^@^K^@^@^@^K^@^E^@^A^@^@^@OpenCL.std^@^@^N^@^C^@^B^@^@^@^B^@^@^@^O^@^F^@^F^@^@^@^K^@^@^@gemmN^@^@^@^E^@^@^@^G^@^N^@3^@^@^@kernel_arg_type.gemmN.float*,float*,int,float*,^@^C^@^C^@^C^@^@^@p~N^A^@^E^@^K^@^E^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^E^@^C^@^L^@^@^@A^@^@^@^E^@^C^@^M^@^@^@B^@^@^@^E^@^C^@^N^@^@^@N^@^@^@^E^@^C^@^O^@^@^@C^@^@^@^E^@^D^@^P^@^@^@entry^@^@^@^E^@^G^@^Q^@^@^@for.cond.cleanup^@^@^@^@^E^@^E^@^R^@^@^@for.body^@^@^@^@^E^@^D^@^T^@^@^@call^@^@^@^@^E^@^D^@^U^@^@^@conv^@^@^@^@^E^@^D^@^W^@^@^@call1^@^@^@^E^@^D^@^X^@^@^@conv2^@^@^@^E^@^D^@[1]^@^@^@^@^@^@^@6^@^E^@^F^@^@^@^K^@^@^@^@^@^@^@
^@^@^@7^@^C^@^H^@^@^@^L^@^@^@7^@^C^@^H^@^@^@^M^@^@^@7^@^C^@ ^@^@^@^N^@^@^@7^@^C^@^H^@^@^@^O^@^@^@?^@^B^@^P^@^@^@=^@^F^@^C^@^@^@^S^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^T^@^@^@^S^@^@^@^@^@^@^@q^@^D^@ ^@^@^@^U^@^@^@^T^@^@^@=^@^F^@^C^@^@^@^V^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^W^@^@^@^V^@^@^@^A^@^@^@q^@^D^@ ^@^@^@^X^@^@^@^W^@^@^@?^@^E^@^Z^@^@^@[2]^@^@^@^P^@^@^@~@^@^E^@ ^@^@^@&^@^@^@$^@^@^@^^@^@^@r^@^D^@^B^@^@^@‘^@^@^@&^@^@^@F^@^E^@^H^@^@^@(^@^@^@^L^@^@^@’^@^@^@=^@^F^@^G^@^@^@)^@^@^@(^@^@^@^B^@^@^@^D^@^@^@~D^@^E^@ ^@^@^@^@^@^@$^@^@^@^N^@^@^@~@^@^E^@ ^@^@^@+^@^@^@^@^@^@^X^@^@^@r^@^D^@^B^@^@^@,^@^@^@+^@^@^@F^@^E^@^H^@^@^@-^@^@^@^M^@^@^@,^@^@^@=^@^F^@^G^@^@^@.^@^@^@-^@^@^@^B^@^@^@^D^@^@^@^L^@^H^@^G^@^@^@^^^@^@^@^A^@^@^@*^@^@^@)^@^@^@.^@^@^@%^@^@^@~@^@^E^@ ^@^@^@#^@^@^@$^@^@^@0^@^@^@?^@^E^@^Z^@^@^@2^@^@^@#^@^@^@^N^@^@^@?^@^D^@2^@^@^@^R^@^@^@^Q^@^@^@8^@^A^@

I am looking for your help or any suggestions on this. The project needs me to generate the SPIRV binary from MLIR dialect kernels and make the generated SPIRV binary run on OCL runtime. I am not sure if the way I generate SPIRV binary from MLIR dialect is correct or not.

Thank you very much for your help.

Here is what I did to generate the two SPIRV binary files in the two cases.

(1) Generate the SPIRV binary from MLIR dialect

I used addi.mlir from MLIR vulkan examples

$ cat addi.mlir
// RUN: mlir-vulkan-runner %s --shared-libs=%vulkan_wrapper_library_dir/libvulkan-runtime-wrappers%shlibext,%linalg_test_lib_dir/libmlir_runner_utils%shlibext --entry-point-result=void | FileCheck %s

// CHECK-COUNT-64: [3, 3, 3, 3, 3, 3, 3, 3]
module attributes {
  gpu.container_module,
  spv.target_env = #spv.target_env<
	#spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>
} {
  gpu.module @kernels {
	gpu.func @kernel_addi(%arg0 : memref<8xi32>, %arg1 : memref<8x8xi32>, %arg2 : memref<8x8x8xi32>)
  	kernel attributes { spv.entry_point_abi = {local_size = dense<[1, 1, 1]>: vector<3xi32>}} {
  	%x = "gpu.block_id"() {dimension = "x"} : () -> index
  	%y = "gpu.block_id"() {dimension = "y"} : () -> index
  	%z = "gpu.block_id"() {dimension = "z"} : () -> index
  	%0 = load %arg0[%x] : memref<8xi32>
  	%1 = load %arg1[%y, %x] : memref<8x8xi32>
  	%2 = addi %0, %1 : i32
  	store %2, %arg2[%z, %y, %x] : memref<8x8x8xi32>
  	gpu.return
	}
  }
 
  func @main() {
	%arg0 = alloc() : memref<8xi32>
	%arg1 = alloc() : memref<8x8xi32>
	%arg2 = alloc() : memref<8x8x8xi32>
	%value0 = constant 0 : i32
	%value1 = constant 1 : i32
	%value2 = constant 2 : i32
	%arg3 = memref_cast %arg0 : memref<8xi32> to memref<?xi32>
	%arg4 = memref_cast %arg1 : memref<8x8xi32> to memref<?x?xi32>
	%arg5 = memref_cast %arg2 : memref<8x8x8xi32> to memref<?x?x?xi32>
	call @fillResource1DInt(%arg3, %value1) : (memref<?xi32>, i32) -> ()
	call @fillResource2DInt(%arg4, %value2) : (memref<?x?xi32>, i32) -> ()
	call @fillResource3DInt(%arg5, %value0) : (memref<?x?x?xi32>, i32) -> ()
 
	%cst1 = constant 1 : index
	%cst8 = constant 8 : index
	gpu.launch_func @kernels::@kernel_addi
    	blocks in (%cst8, %cst8, %cst8) threads in (%cst1, %cst1, %cst1)
    	args(%arg0 : memref<8xi32>, %arg1 : memref<8x8xi32>, %arg2 : memref<8x8x8xi32>)
	%arg6 = memref_cast %arg5 : memref<?x?x?xi32> to memref<*xi32>
	call @print_memref_i32(%arg6) : (memref<*xi32>) -> ()
	return
  }
  func @fillResource1DInt(%0 : memref<?xi32>, %1 : i32)
  func @fillResource2DInt(%0 : memref<?x?xi32>, %1 : i32)
  func @fillResource3DInt(%0 : memref<?x?x?xi32>, %1 : i32)
  func @print_memref_i32(%ptr : memref<*xi32>)
}

$ mlir-opt -convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi addi.mlir &> addi_before_spirv.mlir

$ mlir-vulkan-runner addi_before_spirv.mlir --shared-libs=…/…/lib/libvulkan-runtime-wrappers.dylib,…/…/lib/libmlir_runner_utils.dylib --entry-point-result=void -print-ir-after-all

...
 
// *** IR Dump After ConvertVulkanLaunchFuncToVulkanCalls ***
 
 
module attributes {gpu.container_module, llvm.data_layout = "", spv.target_env = #spv.target_env<#spv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, {}>} {
  llvm.mlir.global internal constant @kernel_addi_spv_entry_point_name("kernel_addi\00")
  llvm.mlir.global internal constant @SPIRV_BIN("\03\02#\07\00\00\01\00\16\00\00\003\00\00\00\00\00\00\00\11\00\02\00\01\00\00\00\0A\00\0B\00SPV_KHR_storage_buffer_storage_class\00\00\00\00\0E\00\03\00\00\00\00\00\01\00\00\00\0F\00\07\00\05\00\00\00\16\00\00\00kernel_addi\00\04\00\00\00\10\00\06\00\16\00\00\00\11\00\00\00\01\00\00\00\01\00\00\00\01\00\00\00\05\00\09\00\04\00\00\00__builtin_var_WorkgroupId__\00\05\00\07\00\09\00\00\00kernel_addi_arg_0\00\00\00\05\00\07\00\0E\00\00\00kernel_addi_arg_1\00\00\00\05\00\07\00\13\00\00\00kernel_addi_arg_2\00\00\00\05\00\05\00\16\00\00\00kernel_addi\00G\00\04\00\04\00\00\00\0B\00\00\00\1A\00\00\00G\00\04\00\07\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\06\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\06\00\00\00\02\00\00\00G\00\04\00\09\00\00\00!\00\00\00\00\00\00\00G\00\04\00\09\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\0C\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\0B\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\0B\00\00\00\02\00\00\00G\00\04\00\0E\00\00\00!\00\00\00\01\00\00\00G\00\04\00\0E\00\00\00\22\00\00\00\00\00\00\00G\00\04\00\11\00\00\00\06\00\00\00\04\00\00\00H\00\05\00\10\00\00\00\00\00\00\00#\00\00\00\00\00\00\00G\00\03\00\10\00\00\00\02\00\00\00G\00\04\00\13\00\00\00!\00\00\00\02\00\00\00G\00\04\00\13\00\00\00\22\00\00\00\00\00\00\00\15\00\04\00\03\00\00\00 \00\00\00\00\00\00\00\17\00\04\00\02\00\00\00\03\00\00\00\03\00\00\00 \00\04\00\01\00\00\00\01\00\00\00\02\00\00\00;\00\04\00\01\00\00\00\04\00\00\00\01\00\00\00+\00\04\00\03\00\00\00\08\00\00\00\08\00\00\00\1C\00\04\00\07\00\00\00\03\00\00\00\08\00\00\00\1E\00\03\00\06\00\00\00\07\00\00\00 \00\04\00\05\00\00\00\0C\00\00\00\06\00\00\00;\00\04\00\05\00\00\00\09\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\0D\00\00\00@\00\00\00\1C\00\04\00\0C\00\00\00\03\00\00\00\0D\00\00\00\1E\00\03\00\0B\00\00\00\0C\00\00\00 \00\04\00\0A\00\00\00\0C\00\00\00\0B\00\00\00;\00\04\00\0A\00\00\00\0E\00\00\00\0C\00\00\00+\00\04\00\03\00\00\00\12\00\00\00\00\02\00\00\1C\00\04\00\11\00\00\00\03\00\00\00\12\00\00\00\1E\00\03\00\10\00\00\00\11\00\00\00 \00\04\00\0F\00\00\00\0C\00\00\00\10\00\00\00;\00\04\00\0F\00\00\00\13\00\00\00\0C\00\00\00\13\00\02\00\15\00\00\00!\00\03\00\14\00\00\00\15\00\00\00+\00\04\00\03\00\00\00\1E\00\00\00\00\00\00\00+\00\04\00\03\00\00\00\1F\00\00\00\01\00\00\00 \00\04\00\22\00\00\00\0C\00\00\00\03\00\00\006\00\05\00\15\00\00\00\16\00\00\00\00\00\00\00\14\00\00\00\F8\00\02\00\17\00\00\00=\00\04\00\02\00\00\00\18\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\19\00\00\00\18\00\00\00\00\00\00\00=\00\04\00\02\00\00\00\1A\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1B\00\00\00\1A\00\00\00\01\00\00\00=\00\04\00\02\00\00\00\1C\00\00\00\04\00\00\00Q\00\05\00\03\00\00\00\1D\00\00\00\1C\00\00\00\02\00\00\00\84\00\05\00\03\00\00\00 \00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00!\00\00\00\1E\00\00\00 \00\00\00A\00\06\00\22\00\00\00#\00\00\00\09\00\00\00\1E\00\00\00!\00\00\00=\00\04\00\03\00\00\00$\00\00\00#\00\00\00\84\00\05\00\03\00\00\00%\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00&\00\00\00\1E\00\00\00%\00\00\00\84\00\05\00\03\00\00\00'\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\00(\00\00\00&\00\00\00'\00\00\00A\00\06\00\22\00\00\00)\00\00\00\0E\00\00\00\1E\00\00\00(\00\00\00=\00\04\00\03\00\00\00*\00\00\00)\00\00\00\80\00\05\00\03\00\00\00+\00\00\00$\00\00\00*\00\00\00\84\00\05\00\03\00\00\00,\00\00\00\0D\00\00\00\1D\00\00\00\80\00\05\00\03\00\00\00-\00\00\00\1E\00\00\00,\00\00\00\84\00\05\00\03\00\00\00.\00\00\00\08\00\00\00\1B\00\00\00\80\00\05\00\03\00\00\00/\00\00\00-\00\00\00.\00\00\00\84\00\05\00\03\00\00\000\00\00\00\1F\00\00\00\19\00\00\00\80\00\05\00\03\00\00\001\00\00\00/\00\00\000\00\00\00A\00\06\00\22\00\00\002\00\00\00\13\00\00\00\1E\00\00\001\00\00\00>\00\03\002\00\00\00+\00\00\00\FD\00\01\008\00\01\00")
  llvm.func @malloc(!llvm.i64) -> !llvm.ptr<i8>
  llvm.func @main() {
	%0 = llvm.mlir.constant(0 : i32) : !llvm.i32
	%1 = llvm.mlir.constant(1 : i32) : !llvm.i32
	%2 = llvm.mlir.constant(2 : i32) : !llvm.i32
	%3 = llvm.mlir.constant(1 : index) : !llvm.i64
	%4 = llvm.mlir.constant(8 : index) : !llvm.i64
	%5 = llvm.mlir.constant(8 : index) : !llvm.i64
	%6 = llvm.mlir.null : !llvm.ptr<i32>
	%7 = llvm.mlir.constant(1 : index) : !llvm.i64
	%8 = llvm.getelementptr %6[%7] : (!llvm.ptr<i32>, !llvm.i64) -> !llvm.ptr<i32>
	%9 = llvm.ptrtoint %8 : !llvm.ptr<i32> to !llvm.i64
	%10 = llvm.mul %5, %9 : !llvm.i64
	%11 = llvm.call @malloc(%10) : (!llvm.i64) -> !llvm.ptr<i8>
...

I took the binary code between the double quotations in @SPIRV_BIN(“”) out and used it as the SPIRV binary.

(2) Generate SPIRV binary from .cl kernel

I used the gemmN.cl kernel from shoc opencl benchmark suite.

I followed the example use-case provided by Ben Ashbaugh at https://github.com/bashbaug/SimpleOpenCLSamples/tree/master/samples/05_spirvkernelfromfile to generate the SPIRV binary, and run it on OCL runtime.

cat gemmN.cl

#ifdef DOUBLE_PRECISION
#define FPTYPE double
#else
#define FPTYPE float
#endif
 
__kernel void gemmN(const __global FPTYPE* A,
                	const __global FPTYPE* B, int N,
                	__global FPTYPE* C)
{
   
	// Thread identifiers
	const int globalRow = get_global_id(0); // Row ID of C (0..N)
	const int globalCol = get_global_id(1); // Col ID of C (0..N)
 
	// Compute a single element (loop over K)
	FPTYPE acc = 0.0f;
	for (int k=0; k<N; k++) {
       	acc += A[globalRow*N + k] * B[k*N + globalCol];
	}
 
	// Store the result
	C[globalRow*N + globalCol] = acc;

clang -c -cl-std=CL1.2 -target spir64 -emit-llvm -Xclang -finclude-default-header -O3 gemmN.cl -o gemmN64.ll

llvm-spirv gemmN64.ll -o gemmN64.spv

vim gemmN64.spv

^C^B#^G^@^@^A^@^N^@^F^@4^@^@^@^@^@^@^@^Q^@^B^@^D^@^@^@^Q^@^B^@^E^@^@^@^Q^@^B^@^F^@^@^@^Q^@^B^@^K^@^@^@^K^@^E^@^A^@^@^@OpenCL.std^@^@^N^@^C^@^B^@^@^@^B^@^@^@^O^@^F^@^F^@^@^@^K^@^@^@gemmN^@^@^@^E^@^@^@^G^@^N^@3^@^@^@kernel_arg_type.gemmN.float*,float*,int,float*,^@^C^@^C^@^C^@^@^@p~N^A^@^E^@^K^@^E^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^E^@^C^@^L^@^@^@A^@^@^@^E^@^C^@^M^@^@^@B^@^@^@^E^@^C^@^N^@^@^@N^@^@^@^E^@^C^@^O^@^@^@C^@^@^@^E^@^D^@^P^@^@^@entry^@^@^@^E^@^G^@^Q^@^@^@for.cond.cleanup^@^@^@^@^E^@^E^@^R^@^@^@for.body^@^@^@^@^E^@^D^@^T^@^@^@call^@^@^@^@^E^@^D^@^U^@^@^@conv^@^@^@^@^E^@^D^@^W^@^@^@call1^@^@^@^E^@^D^@^X^@^@^@conv2^@^@^@^E^@^D^@^[^@^@^@cmp26^@^@^@^E^@^C^@^\^@^@^@mul^@^E^@^E^@^_^@^@^@acc.0.lcssa^@^E^@^D^@ ^@^@^@add10^@^@^@^E^@^E^@!^@^@^@idxprom11^@^@^@^E^@^E^@"^@^@^@arrayidx12^@^@^E^@^C^@#^@^@^@inc^@^E^@^D^@$^@^@^@k.028^@^@^@^E^@^D^@%^@^@^@acc.027^@^E^@^C^@&^@^@^@add^@^E^@^D^@'^@^@^@idxprom^@^E^@^E^@(^@^@^@arrayidx^@^@^@^@^E^@^D^@*^@^@^@mul4^@^@^@^@^E^@^D^@+^@^@^@add5^@^@^@^@^E^@^E^@,^@^@^@idxprom6^@^@^@^@^E^@^E^@-^@^@^@arrayidx7^@^@^@^E^@^C^@2^@^@^@cmp^@G^@^D^@^E^@^@^@^K^@^@^@^\^@^@^@G^@^C^@^E^@^@^@^V^@^@^@G^@^D^@^L^@^@^@&^@^@^@^E^@^@^@G^@^D^@^M^@^@^@&^@^@^@^E^@^@^@G^@^D^@^O^@^@^@&^@^@^@^E^@^@^@G^@^D^@^L^@^@^@&^@^@^@^F^@^@^@G^@^D^@^M^@^@^@&^@^@^@^F^@^@^@G^@^M^@^E^@^@^@)^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^A^@^@^@^U^@^D^@^B^@^@^@@^@^@^@^@^@^@^@^U^@^D^@	^@^@^@ ^@^@^@^@^@^@^@+^@^D^@	^@^@^@^Y^@^@^@^@^@^@^@+^@^D^@   ^@^@^@0^@^@^@^A^@^@^@^W^@^D^@^C^@^@^@^B^@^@^@^C^@^@^@ ^@^D^@^D^@^@^@^A^@^@^@^C^@^@^@^S^@^B^@^F^@^@^@^V^@^C^@^G^@^@^@ ^@^@^@ ^@^D^@^H^@^@^@^E^@^@^@^G^@^@^@!^@^G^@
^@^@^@^F^@^@^@^H^@^@^@^H^@^@^@  ^@^@^@^H^@^@^@^T^@^B^@^Z^@^@^@;^@^D^@^D^@^@^@^E^@^@^@^A^@^@^@+^@^D^@^G^@^@^@^]^@^@^@^@^@^@^@6^@^E^@^F^@^@^@^K^@^@^@^@^@^@^@
^@^@^@7^@^C^@^H^@^@^@^L^@^@^@7^@^C^@^H^@^@^@^M^@^@^@7^@^C^@ 	^@^@^@^N^@^@^@7^@^C^@^H^@^@^@^O^@^@^@?^@^B^@^P^@^@^@=^@^F^@^C^@^@^@^S^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^T^@^@^@^S^@^@^@^@^@^@^@q^@^D^@    	^@^@^@^U^@^@^@^T^@^@^@=^@^F^@^C^@^@^@^V^@^@^@^E^@^@^@^B^@^@^@ ^@^@^@Q^@^E^@^B^@^@^@^W^@^@^@^V^@^@^@^A^@^@^@q^@^D^@  	^@^@^@^X^@^@^@^W^@^@^@?^@^E^@^Z^@^@^@^[^@^@^@^N^@^@^@^Y^@^@^@~D^@^E^@   ^@^@^@^\^@^@^@^U^@^@^@^N^@^@^@?^@^D^@^[^@^@^@^R^@^@^@^Q^@^@^@?^@^B^@^Q^@^@^@?^@^G^@^G^@^@^@^_^@^@^@^]^@^@^@^P^@^@^@^^^@^@^@^R^@^@^@~@^@^E^@ 	^@^@^@ ^@^@^@^\^@^@^@^X^@^@^@r^@^D^@^B^@^@^@!^@^@^@ ^@^@^@F^@^E^@^H^@^@^@"^@^@^@^O^@^@^@!^@^@^@>^@^E^@"^@^@^@^_^@^@^@^B^@^@^@^D^@^@^@?^@^A^@?^@^B^@^R^@^@^@?^@^G^@  	^@^@^@$^@^@^@#^@^@^@^R^@^@^@^Y^@^@^@^P^@^@^@?^@^G^@^G^@^@^@%^@^@^@^^^@^@^@^R^@^@^@^]^@^@^@^P^@^@^@~@^@^E^@  	^@^@^@&^@^@^@$^@^@^@^\^@^@^@r^@^D^@^B^@^@^@'^@^@^@&^@^@^@F^@^E^@^H^@^@^@(^@^@^@^L^@^@^@'^@^@^@=^@^F^@^G^@^@^@)^@^@^@(^@^@^@^B^@^@^@^D^@^@^@~D^@^E^@ 	^@^@^@*^@^@^@$^@^@^@^N^@^@^@~@^@^E^@	^@^@^@+^@^@^@*^@^@^@^X^@^@^@r^@^D^@^B^@^@^@,^@^@^@+^@^@^@F^@^E^@^H^@^@^@-^@^@^@^M^@^@^@,^@^@^@=^@^F^@^G^@^@^@.^@^@^@-^@^@^@^B^@^@^@^D^@^@^@^L^@^H^@^G^@^@^@^^^@^@^@^A^@^@^@*^@^@^@)^@^@^@.^@^@^@%^@^@^@~@^@^E^@ ^@^@^@#^@^@^@$^@^@^@0^@^@^@?^@^E^@^Z^@^@^@2^@^@^@#^@^@^@^N^@^@^@?^@^D^@2^@^@^@^R^@^@^@^Q^@^@^@8^@^A^@

spirv-dis gemmN64.spv

; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 52
; Schema: 0
           	OpCapability Addresses
           	OpCapability Linkage
           	OpCapability Kernel
           	OpCapability Int64
      	%1 = OpExtInstImport "OpenCL.std"
           	OpMemoryModel Physical64 OpenCL
           	OpEntryPoint Kernel %11 "gemmN" %__spirv_BuiltInGlobalInvocationId
     	%51 = OpString "kernel_arg_type.gemmN.float*,float*,int,float*,"
           	OpSource OpenCL_C 102000
           	OpName %__spirv_BuiltInGlobalInvocationId "__spirv_BuiltInGlobalInvocationId"
           	OpName %A "A"
           	OpName %B "B"
           	OpName %N "N"
           	OpName %C "C"
           	OpName %entry "entry"
           	OpName %for_cond_cleanup "for.cond.cleanup"
           	OpName %for_body "for.body"
           	OpName %call "call"
           	OpName %conv "conv"
 	          OpName %call1 "call1"
           	OpName %conv2 "conv2"
           	OpName %cmp26 "cmp26"
           	OpName %mul "mul"
           	OpName %acc_0_lcssa "acc.0.lcssa"
           	OpName %add10 "add10"
           	OpName %idxprom11 "idxprom11"
           	OpName %arrayidx12 "arrayidx12"
           	OpName %inc "inc"
           	OpName %k_028 "k.028"
           	OpName %acc_027 "acc.027"
           	OpName %add "add"
           	OpName %idxprom "idxprom"
           	OpName %arrayidx "arrayidx"
           	OpName %mul4 "mul4"
           	OpName %add5 "add5"
           	OpName %idxprom6 "idxprom6"
           	OpName %arrayidx7 "arrayidx7"
           	OpName %cmp "cmp"
           	OpDecorate %__spirv_BuiltInGlobalInvocationId BuiltIn GlobalInvocationId
           	OpDecorate %__spirv_BuiltInGlobalInvocationId Constant
           	OpDecorate %A FuncParamAttr NoCapture
           	OpDecorate %B FuncParamAttr NoCapture
           	OpDecorate %C FuncParamAttr NoCapture
           	OpDecorate %A FuncParamAttr NoWrite
           	OpDecorate %B FuncParamAttr NoWrite
           	OpDecorate %__spirv_BuiltInGlobalInvocationId LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
  	%ulong = OpTypeInt 64 0
   	%uint = OpTypeInt 32 0
 	%uint_0 = OpConstant %uint 0
 	%uint_1 = OpConstant %uint 1
	%v3ulong = OpTypeVector %ulong 3
%_ptr_Input_v3ulong = OpTypePointer Input %v3ulong
   	%void = OpTypeVoid
  	%float = OpTypeFloat 32
%_ptr_CrossWorkgroup_float = OpTypePointer CrossWorkgroup %float
     	%10 = OpTypeFunction %void %_ptr_CrossWorkgroup_float %_ptr_CrossWorkgroup_float %uint %_ptr_CrossWorkgroup_float
   	%bool = OpTypeBool
%__spirv_BuiltInGlobalInvocationId = OpVariable %_ptr_Input_v3ulong Input
	%float_0 = OpConstant %float 0
     	%11 = OpFunction %void None %10
      	%A = OpFunctionParameter %_ptr_CrossWorkgroup_float
      	%B = OpFunctionParameter %_ptr_CrossWorkgroup_float
      	%N = OpFunctionParameter %uint
      	%C = OpFunctionParameter %_ptr_CrossWorkgroup_float
  	%entry = OpLabel
     	%19 = OpLoad %v3ulong %__spirv_BuiltInGlobalInvocationId Aligned 32
   	%call = OpCompositeExtract %ulong %19 0
   	%conv = OpUConvert %uint %call
     	%22 = OpLoad %v3ulong %__spirv_BuiltInGlobalInvocationId Aligned 32
  	%call1 = OpCompositeExtract %ulong %22 1
  	%conv2 = OpUConvert %uint %call1
  	%cmp26 = OpSGreaterThan %bool %N %uint_0
    	%mul = OpIMul %uint %conv %N
           	OpBranchConditional %cmp26 %for_body %for_cond_cleanup
%for_cond_cleanup = OpLabel
%acc_0_lcssa = OpPhi %float %float_0 %entry %30 %for_body
  	%add10 = OpIAdd %uint %mul %conv2
  %idxprom11 = OpSConvert %ulong %add10
 %arrayidx12 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %C %idxprom11
           	OpStore %arrayidx12 %acc_0_lcssa Aligned 4
           	OpReturn
   %for_body = OpLabel
  	%k_028 = OpPhi %uint %inc %for_body %uint_0 %entry
	%acc_027 = OpPhi %float %30 %for_body %float_0 %entry
    	%add = OpIAdd %uint %k_028 %mul
	%idxprom = OpSConvert %ulong %add
   %arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %A %idxprom
     	%41 = OpLoad %float %arrayidx Aligned 4
   	%mul4 = OpIMul %uint %k_028 %N
   	%add5 = OpIAdd %uint %mul4 %conv2
   %idxprom6 = OpSConvert %ulong %add5
  %arrayidx7 = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_float %B %idxprom6
     	%46 = OpLoad %float %arrayidx7 Aligned 4
     	%30 = OpExtInst %float %1 mad %41 %46 %acc_027
    	%inc = OpIAdd %uint %k_028 %uint_1
    	%cmp = OpSLessThan %bool %inc %N
           	OpBranchConditional %cmp %for_body %for_cond_cleanup
           	OpFunctionEnd

-Lenny


  1. ^@^@^@cmp26^@^@^@^E^@^C^@^^@^@^@mul^@^E^@^E^@^_^@^@^@acc.0.lcssa^@^E^@^D^@ ^@^@^@add10^@^@^@^E^@^E^@!^@^@^@idxprom11^@^@^@^E^@^E^@"^@^@^@arrayidx12^@^@^E^@^C^@#^@^@^@inc^@^E^@^D^@$^@^@^@k.028^@^@^@^E^@^D^@%^@^@^@acc.027^@^E^@^C^@&^@^@^@add^@^E^@^D^@'^@^@^@idxprom^@^E^@^E^@(^@^@^@arrayidx^@^@^@^@^E^@^D^@*^@^@^@mul4^@^@^@^@^E^@^D^@+^@^@^@add5^@^@^@^@^E^@^E^@,^@^@^@idxprom6^@^@^@^@^E^@^E^@-^@^@^@arrayidx7^@^@^@^E^@^C^@2^@^@^@cmp^@G^@^D^@^E^@^@^@^K^@^@^@^^@^@^@G^@^C^@^E^@^@^@^V^@^@^@G^@^D^@^L^@^@^@&^@^@^@^E^@^@^@G^@^D^@^M^@^@^@&^@^@^@^E^@^@^@G^@^D^@^O^@^@^@&^@^@^@^E^@^@^@G^@^D^@^L^@^@^@&^@^@^@^F^@^@^@G^@^D^@^M^@^@^@&^@^@^@^F^@^@^@G^@^M^@^E^@^@^@)^@^@^@__spirv_BuiltInGlobalInvocationId^@^@^@^A^@^@^@^U^@^D^@^B^@^@^@@^@^@^@^@^@^@^@^U^@^D^@ ^@^@^@ ^@^@^@^@^@^@^@+^@^D^@ ^@^@^@^Y^@^@^@^@^@^@^@+^@^D^@ ^@^@^@0^@^@^@^A^@^@^@^W^@^D^@^C^@^@^@^B^@^@^@^C^@^@^@ ^@^D^@^D^@^@^@^A^@^@^@^C^@^@^@^S^@^B^@^F^@^@^@^V^@^C^@^G^@^@^@ ^@^@^@ ^@^D^@^H^@^@^@^E^@^@^@^G^@^@^@!^@^G^@
    ^@^@^@^F^@^@^@^H^@^@^@^H^@^@^@ ^@^@^@^H^@^@^@^T^@^B^@^Z^@^@^@;^@^D^@^D^@^@^@^E^@^@^@^A^@^@^@+^@^D^@^G^@^@^@^ ↩︎

  2. ^@^@^@^N^@^@^@^Y^@^@^@~D^@^E^@ ^@^@^@^^@^@^@^U^@^@^@^N^@^@^@?^@^D^@[2]^@^@^@^P^@^@^@^^^@^@^@^R^@^@^@~@^@^E^@ ^@^@^@ ^@^@^@^^@^@^@^X^@^@^@r^@^D^@^B^@^@^@!^@^@^@ ^@^@^@F^@^E^@^H^@^@^@“^@^@^@^O^@^@^@!^@^@^@>^@^E^@”^@^@^@^_^@^@^@^B^@^@^@^D^@^@^@?^@^A^@?^@^B^@^R^@^@^@?^@^G^@ ^@^@^@$^@^@^@#^@^@^@^R^@^@^@^Y^@^@^@^P^@^@^@?^@^G^@^G^@^@^@%^@^@^@^^^@^@^@^R^@^@^@^ ↩︎

2 Likes

Hi @lenny_guo, sorry for the late reply. Somehow I didn’t receive an email for this thread…

Keep in mind that SPIR-V is just a common binary language used by multiple Khronos APIs. Each API has further constraints on the subset of SPIR-V it accepts. This is typically expressed in the API’s SPIR-V environments, e.g., Vulkan, OpenCL. So you’ll need to make sure the generated SPIR-V blob follows the constraints listed there.

Here you are using a SPIR-V blob that is meant for Vulkan (with the Shader capability). So it won’t be accepted by OpenCL (supports the Kernel capability). That’s why you are seeing the failure I believe.

At the moment the SPIR-V dialect and various conversions to it support Vulkan the best. To enable support OpenCL, there shouldn’t be a huge amount of additional work because lots of the stuff can just be shared. (You can read my previous comment here.) There are quite a few contributions landed previously to push on that. So we have tests like this, this, and this for OpenCL. But right now we don’t have an in-tree mlir-opencl-runner like mlir-vulkan-runner to demonstrate the whole follow to you. It would be quite awesome if that can be built up for sure. For the specific task you are facing and just to turn the light on, you might want to play with this IR snippet by plugging in the stuff you want to see how it goes.

1 Like

Hi Lei, thank you very much for your kind help :smiley:. I will think about this problem and get back to you if I have further questions. I am also considering if I want to take the task to implement the mlir-opencl-runner. I will let you know if I decide to do that as part of my project.

No problem! If you’d like to contribute to bring up more OpenCL related features that would be truly awesome; but certainly no rush! Let me know if you have more questions. Happy to help.

Good morning Lei, I decided to proceed with the mlir-opencl-runner after talking to my advisor :smile:. I will work on this every Wednesday and Thursday. I want to let you know where I am. I started getting to know everything related to MLIR two months ago. This means that I do not have too much experience. However, I have been reading the vulkan runner code for a while, and I understand there are about two steps. The first step is to pinpoint the compilation/lowering process from MLIR to the OpenCL version SPIR-V dialect and binary. The second step is to make the SPIR-V binary run on the OCL runtime, which includes a runtime and a wrapper (maybe). I am looking forward to more information and details about each step to make it work. I will work on the lowering process first. Is the lowering process available now? I am not sure, and will need to try a few examples you shared with me. Also, could you please give me more information for the lowering process? Thank you very much. Please let me know if you have any comments. :smiley:

That is fantastic! Thanks for your help in advance! No worries about just getting started on MLIR and stuff; actually adding support for OpenCL might be a nice hands-on experience to get yourself more familiar with the stack. :slight_smile:

The steps you described in the above are correct. There are two major components: 1) lowering for device kernels and 2) lowering/runtime for host scheduling. The fundamental bits for lowering to device kernels are mostly there but likely certain things are not properly connected. The test files I pointed out in the above are good places to start with. In the test file you can find the command to run to actually convert the IR–please feel free to give it a shot. But I assume you are not very familiar with SPIR-V and its OpenCL flavor based on the previous discussions. (That is entirely fine!) You may want to check out OpenCL’s SPIR-V execution environment to learn more. It’s quite different from Vulkan’s, which you can use as a contrast. I think a good old first step is to get a kernel with spv.FMul working. I don’t expect too much work for it actually. For host side scheduling, the way to do it in MLIR core is to provide a lowering path from GPU dialect’s host launch logic to some op that represents OpenCL API calls. They are eventually converted into LLVM external function calls that are backed by C++ implementations. The Vulkan runner might be a bit verbose for you to understand the logic given Vulkan’s low-level nature. OpenCL is more akin to CUDA/ROCm so you can probably get more direct understanding by reading mlir-cuda-runner. I’d suggest you to follow the logic in the main file of mlir-cuda-runner to understand what each step is doing. The meat is in its runMLIRPasses function. Those passes are worth taking a look. You may want to just run the compiled mlir-cuda-runner (if you have a NVIDIA GPU) on some test IR with -print-ir-after-all to really see the stuff after each step. Those passes convert the input GPU launch logic gradually into LLVM external function calls that are linked at JITing time into C++ implementations wrapping API calls.

One thing worth noting is that various MLIR runners are based on LLVM JIT support. This may or may not be a problem, depending on your final goal or usage environment. But nonetheless, it’s certainly a valuable contribution that we’d really appreciate! :slight_smile:

Hi Lei, many thanks for your detailed instructions. I have successfully run the test examples you suggested above. I have a few questions here.

  1. Is this command (mlir-opt -allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics module-structure-opencl.mlir -o -) mentioned in here to generate the OpenCL style SPIR-V dialect from a GPU dialect?
    I used to generate the Vulkan style SPIR-V by “mlir-opt test-tensor-matmul.mlir -linalg-bufferize --convert-vector-to-scf --convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi --convert-gpu-to-spirv”

  2. I believe this is the fundamental bits that lower to OpenCL device kernels. Am I correct?

I am currently reading the source code of “mlir-translate” and “mlir-opt” to understand the functionality of flags “-test-spirv-roundtrip”, “-spirv-lower-abi-attrs”, “-verify-diagnostics”, and “-allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics”.

Please let me know if I am not on the right direction. :slightly_smiling_face: Thank you very much.

| lenny_guo
January 20 |

  • | - |

Hi Lei, many thanks for your detailed instructions. I have successfully run the test examples you suggested above.

Great!

I have a few questions here.

  1. Is this command (mlir-opt -allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics module-structure-opencl.mlir -o -) mentioned in here to generate the OpenCL style SPIR-V dialect from a GPU dialect?

Yep. What’s important is the spv.target_env there. It controls the target environment. Kernel there means OpenCL. (Shader means Vulkan.)
Generally common math ops are available to both OpenCL and Vulkan. Ops specific to one execution environment will depend on either Kernel or Shader, if you read the SPIR-V spec.
(But the above example does nothing: it just directly returns. So it does not reveal much. :slight_smile:

  1. I used to generate the Vulkan style SPIR-V by “mlir-opt test-tensor-matmul.mlir -linalg-bufferize --convert-vector-to-scf --convert-linalg-to-parallel-loops -test-gpu-greedy-parallel-loop-mapping -convert-parallel-loops-to-gpu -gpu-kernel-outlining -canonicalize -legalize-std-for-spirv -test-spirv-entry-point-abi --convert-gpu-to-spirv”

  2. I believe this is the fundamental bits that lower to OpenCL device kernels. Am I correct?

That’s the right flow if you want to go from something at Linalg level. MLIR conversion is progressive and composble. That’s why you see such a long list of command-line options: they are performing one step or one chunk of IR conversion. Various runners are just programmatically calling into these passes.

I am currently reading the source code of “mlir-translate” and “mlir-opt” to understand the functionality of flags “-test-spirv-roundtrip”, “-spirv-lower-abi-attrs”, “-verify-diagnostics”, and “-allow-unregistered-dialect -convert-gpu-to-spirv -verify-diagnostics”.

Those can be useful. Another thing I’d suggest you to look into is run mlir-translate -serialize-spirv on a piece of SPIR-V IR and run spirv-dis from Vulkan SDK to dump the raw bits in a SPIR-V blob to get familiar with it. That’s what gotten sent to the drivers and you can clearly see stuff like OpCapability Kernel and others.

Hi Lei,

Thank you very much. I have successfully generated the SPIR-V binary from a OpenCL MLIR, and was able to disassemble the generated SPIR-V binary using spirv-dis :smile:. Also, I have tried to compose new OpenCL MLIR kernels myself, thanks to the OpenCL/Vulkan documents you referred me to read. Furthermore, I have tested that the generated SPIR-V binary can be taken by the clCreateProgramWithIL() function, which is a good news for our project. I am still reading the OpenCL/Vulkan documents. I think I am ready to move on to the mlir opencl runner :slightly_smiling_face:. Many thanks!

Hi Lei,

How are you? I am getting an error while trying to return the SPIR-V binary as an argument through the lowering pass. I hope that you can give me some suggestion. :slightly_smiling_face: Thank you very much.

This is my error:

root@14ce19c25f1f:~/project/ocl-mcl-runner/duomo-dmc/build/tools/mlir/tools# make


[100%] Built target obj.MLIRTestIR
[100%] Linking CXX shared library …/…/…/…/…/lib/libMLIRTestIR.so
[100%] Built target MLIRTestIR
[100%] Built target obj.MLIRTestPass
[100%] Built target MLIRTestPass
[100%] Built target obj.MLIRTestReducer
[100%] Built target MLIRTestReducer
[100%] Built target obj.MLIRTestRewrite
[100%] Linking CXX shared library …/…/…/…/…/lib/libMLIRTestRewrite.so
[100%] Built target MLIRTestRewrite
Scanning dependencies of target obj.MLIRMlirOptMain
[100%] Building CXX object tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/mlir-opt.cpp.o
In file included from /root/project/ocl-mcl-runner/duomo-dmc/mlir/include/mlir/Conversion/Passes.h:43:0,
from /root/project/ocl-mcl-runner/duomo-dmc/mlir/include/mlir/InitAllPasses.h:17,
from /root/project/ocl-mcl-runner/duomo-dmc/mlir/tools/mlir-opt/mlir-opt.cpp:17:
/root/project/ocl-mcl-runner/duomo-dmc/build/tools/mlir/include/mlir/Conversion/Passes.h.inc: In lambda function:
/root/project/ocl-mcl-runner/duomo-dmc/build/tools/mlir/include/mlir/Conversion/Passes.h.inc:1306:49: error: too few arguments to function ‘std::unique_ptr<mlir::OperationPassmlir::ModuleOp > mlir::createConvertSPIRVToModulePass(llvm::SmallVector<unsigned int, 0>)’
return mlir::createConvertSPIRVToModulePass();
^
In file included from /root/project/ocl-mcl-runner/duomo-dmc/mlir/include/mlir/Conversion/Passes.h:29:0,
from /root/project/ocl-mcl-runner/duomo-dmc/mlir/include/mlir/InitAllPasses.h:17,
from /root/project/ocl-mcl-runner/duomo-dmc/mlir/tools/mlir-opt/mlir-opt.cpp:17:
/root/project/ocl-mcl-runner/duomo-dmc/mlir/include/mlir/Conversion/SPIRVToModule/SPIRVToModulePass.h:42:42: note: declared here
std::unique_ptr<OperationPass> createConvertSPIRVToModulePass(SmallVector<uint32_t, 0> binn);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/build.make:81: recipe for target ‘tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/mlir-opt.cpp.o’ failed
make[2]: *** [tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/mlir-opt.cpp.o] Error 1
CMakeFiles/Makefile2:51367: recipe for target ‘tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/all’ failed
make[1]: *** [tools/mlir/tools/mlir-opt/CMakeFiles/obj.MLIRMlirOptMain.dir/all] Error 2
Makefile:170: recipe for target ‘all’ failed
make: *** [all] Error 2

I was trying to follow the VUKAN runner, which did not return the SPIR-V binary as an argument through the lowering pass, but I did not see the trick how the generated SPIR-V binary is taken by the VUKAN runtime. However, the CUDA runner is different, which returns the binary through the lowering pass, and the runtime pass then takes the binary as an argument. I am following what it does in the CUDA runner. First, I have a lowering pass, and return the SPIR-V binary as an argument through it, and then I have a runtime pass which supposed to take the returned SPIR-V binary as input.

This is part of my lowering pass (SPIRVToModulePass.cpp):

namespace {
/// A pass converting SPIR-V binary into module operations.
class ConvertSPIRVToModulePass
: public ConvertSPIRVToModuleBase {
public:
ConvertSPIRVToModulePass(SmallVector<uint32_t, 0> binn) : binary(binn) {}

void runOnOperation() override;

private:
SmallVector<uint32_t, 0> binary;
};
} // namespace

void ConvertSPIRVToModulePass::runOnOperation() {
ModuleOp module = getOperation();
// mlir::serializeModule(module);
mlir::serializeModule(module, binary); → this is from the serialize-spirv flag
}

std::unique_ptr<OperationPass> mlir::createConvertSPIRVToModulePass(SmallVector<uint32_t, 0> binn) {
return std::make_unique(binn);
}

And the header SPIRVToModulePass.h:

namespace mlir {

class ModuleOp;

template

class OperationPass;

/// Creates a pass to convert SPIRV binary to Module operations

// std::unique_ptr<OperationPass> createConvertSPIRVToModulePass();

std::unique_ptr<OperationPass> createConvertSPIRVToModulePass(SmallVector<uint32_t, 0> binn);

} // namespace mlir

I realized that I need to modify the file Passes.td to register my pass in the passes header Passes.h.inc, which is automatically generated from Passes.td. This is what I added to Passes.td for the lowering pass.

//===----------------------------------------------------------------------===//
// SPIRVToModule
//===----------------------------------------------------------------------===//

def ConvertSPIRVToModule : Pass<“convert-spirv-to-module”, “ModuleOp”> {
let summary = “Convert SPIR-V dialect to module operations”;
let description = [{
See https://mlir.llvm.org/docs/SPIRVToModuleConversion/
for more details.
}];
let constructor = “mlir::createConvertSPIRVToModulePass()”;
let dependentDialects = [“LLVM::LLVMDialect”];
let options = [Option<“binn”, “spirv-binary”, “SmallVector<uint32_t, 0>”, /default=/“”, “the generated spirv binary”>];
}

And the related code in the generated Passes.h.inc:
//===----------------------------------------------------------------------===//
// ConvertSPIRVToModule Registration
//===----------------------------------------------------------------------===//

inline void registerConvertSPIRVToModulePass() {
::mlir::registerPass(“convert-spirv-to-module”, “Convert SPIR-V dialect to module operations”, → std::unique_ptr<::mlir::Pass> {
return mlir::createConvertSPIRVToModulePass();
});
}

//===----------------------------------------------------------------------===//
// ConvertSPIRVToModule
//===----------------------------------------------------------------------===//

template
class ConvertSPIRVToModuleBase : public ::mlir::OperationPass {
public:
ConvertSPIRVToModuleBase() : ::mlir::OperationPass(::mlir::TypeID::get()) {}
ConvertSPIRVToModuleBase(const ConvertSPIRVToModuleBase &) : ::mlir::OperationPass(::mlir::TypeID::get()) {}

/// Returns the command-line argument attached to this pass.
::llvm::StringRef getArgument() const override { return “convert-spirv-to-module”; }

/// Returns the derived pass name.
::llvm::StringRef getName() const override { return “ConvertSPIRVToModule”; }

/// Support isa/dyn_cast functionality for the derived pass class.
static bool classof(const ::mlir::Pass *pass) {
return pass->getTypeID() == ::mlir::TypeID::get();
}

/// A clone method to create a copy of this pass.
std::unique_ptr<::mlir::Pass> clonePass() const override {
return std::make_unique(*static_cast<const DerivedT *>(this));
}

/// Return the dialect that must be loaded in the context before this pass.
void getDependentDialects(::mlir::DialectRegistry &registry) const override {

registry.insertLLVM::LLVMDialect();

}

protected:
::mlir::Pass::Option<SmallVector<uint32_t, 0>> binn{*this, “spirv-binary”, ::llvm::cl::desc(“the generated spirv binary”)};
};

//===----------------------------------------------------------------------===//
// Conversion Registration
//===----------------------------------------------------------------------===//

inline void registerConversionPasses() {

registerConvertSPIRVToModulePass();

}
#undef GEN_PASS_REGISTRATION
#endif // GEN_PASS_REGISTRATION

Hey @lenny_guo,

For the Vulkan runner the SPIR-V blob is passed through as an attribute. Specifically, it is serialized and attached here and then later retrieved here.

That makes sense to me. OpenCL is more akin to CUDA than Vulkan so following what CUDA runner does is okay (and I think you can even reuse many patterns/passes there when possible). From what I can tell, CUDA runner also uses an attribute attached to the module for the compiled executable blob.

The immediate error you hit here is just a mismatch between the function call to its declaration, which requires a parameter? You’ll need to make sure various places (*Pass.td, *Pass.h, *Pass.cpp) are consistent regarding that.

1 Like

Hi Lei,

I am getting to the core part of the SPIRV runtime code. But I am getting issues to pass (walk through) the SPIRV operations. Could you have a look? Thank you very much.

Here is my code and my comments inlined.

void LaunchFuncToOpenCLCallsPass::runOnOperation() {

// Collect SPIR-V attributes spirv_blob and spirv_entry_point_name
getOperation().walk([this](LLVM::CallOp op) {

// or (spirv::ModuleOp op)? ← what operation type should I use here? Both LLVM::CallOp and spirv::ModuleOp do not work

// if (isVulkanLaunchCallOp(op)) { 

I disabled this condition in order to get over. Do I need to have isOpenCLaunchCallOp here?

if (true) {
      OpBuilder builder(op);
      Location loc = op.getLoc();

  // Create call to `initVulkan`.
  auto initOpenCLCall = builder.create\<LLVM::CallOp\> 

// or <spirv::FuncOp>? ← what operation type should I use here? Both LLVM::CallOp and spirv::FuncOp do not work. There is not an operation called spirv::CallOp.

    (loc, TypeRange{getPointerType()}, builder.getSymbolRefAttr(kInitOpenCL),
    ValueRange{});

  auto openclRuntime = initOpenCLCall.getResult(0); 

// <spirv::FuncOp> does not support getResult(0)

  auto spirvBlobAttr =
      op->getAttrOfType<StringAttr>(kSPIRVBlobAttrName);
  if (!spirvBlobAttr) {
    op.emitError()
        << "missing " << kSPIRVBlobAttrName << " attribute";
    return signalPassFailure();
  }

  // auto spirvEntryPointNameAttr =
  //     op->getAttrOfType<StringAttr>(kSPIRVEntryPointAttrName);
  // if (!spirvEntryPointNameAttr) {
  //   op.emitError()
  //       << "missing " << kSPIRVEntryPointAttrName << " attribute";
  //   return signalPassFailure();
  // }
  // Create LLVM global with SPIR-V binary data, so we can pass a pointer with
  // that data to runtime call.
  Value ptrToSPIRVBinary = LLVM::createGlobalString(
      loc, builder, kSPIRVBinary, spirvBlobAttr.getValue(),
      LLVM::Linkage::Internal);

  // Create spirv constant for the size of SPIR-V binary shader.
  Value binarySize = builder.create<LLVM::ConstantOp>( 

// Or <spirv::ConstantOp> here?

      loc, getInt32Type(),
      builder.getI32IntegerAttr(spirvBlobAttr.getValue().size()));

  // Create call to `setBinaryShader` runtime function with the given pointer to
  // SPIR-V binary and binary size.
  builder.create<LLVM::CallOp>( 

// ← There is not an operation called <spirv::CallOp>

      loc, TypeRange{getVoidType()}, builder.getSymbolRefAttr(kRunOnOpenCL),
      ValueRange{openclRuntime, ptrToSPIRVBinary, binarySize});

  op.erase();
}
});
}

Hi, Lenny

May I know what’s the status of your work ? Is the runner available ?

Thank you!

It hasn’t landed upstream yet. But maybe Lenny has something privately. :slight_smile:

Lenny, I just noticed that I didn’t reply to your questions in the above. Let me know if that’s still an issue.
Also I’d suggest sending a draft patch via reviews.llvm.org for interactive discussions on code. Code posting and debugging here is hard to follow, given missing context, etc.

BTW, we are working on enabling gpu pipeline for Intel L0 API (which expects same spirv ‘flavor’ as OpenCL), but it is still at very early stages, and also it is tightly integrated with our infrastructure (I don’t have plans for standalone L0/OpenCL runner at the moment).

https://github.com/Hardcode84/mlir-extensions/blob/ibutygin/gpu-pipeline/numba_dpcomp/mlir_compiler/src/pipelines/lower_to_gpu.cpp#L1378

That’s awesome, thanks for sharing @Hardcode84 ! Really glad to see that.

Intel L0 API (which expects same spirv ‘flavor’ as OpenCL)

So it’s some mutation of OpenCL-flavored SPIR-V? With some Intel extensions, etc? Any pointers that I can read a bit more?

https://spec.oneapi.io/level-zero/latest/core/SPIRV.html

It should be almost identical to OpenCL spirv, right now I am trying to imitate spirv generated by clang for OpenCL (through the llvm-spirv translator)