How to lowering gpu.launch correctly

Hi, everyone. I have a mlir func to call gpu.launch.

func.func @gpu_func(%arg4: !gpu.async.token ){
  %c1 = arith.constant 1 : index  
  %c2 = arith.constant 32 : index 
  %2 = gpu.launch async [%arg4] blocks(%arg7, %arg8, %arg9) in (%arg13 = %c1, %arg14 = %c1, %arg15 = %c1) threads(%arg10, %arg11, %arg12) in (%arg16 = %c2, %arg17 = %c1, %arg18 = %c1) {
    gpu.terminator
  } 
  return 
}

Here is my lower pass pipeline.

${MLIR_OPT} ./test.mlir \
		--arith-bufferize \
		--finalizing-bufferize \
		--arith-expand \
		--convert-arith-to-llvm \
		--convert-gpu-to-nvvm \
		--llvm-request-c-wrappers \
		--convert-func-to-llvm

And then I get such mlir func.

module {
  func.func @gpu_func(%arg0: !gpu.async.token) attributes {llvm.emit_c_interface} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = builtin.unrealized_conversion_cast %0 : i64 to index
    %2 = llvm.mlir.constant(32 : index) : i64
    %3 = builtin.unrealized_conversion_cast %2 : i64 to index
    %4 = gpu.launch async [%arg0] blocks(%arg1, %arg2, %arg3) in (%arg7 = %1, %arg8 = %1, %arg9 = %1) threads(%arg4, %arg5, %arg6) in (%arg10 = %3, %arg11 = %1, %arg12 = %1) {
      gpu.terminator
    }
    llvm.return
  }
}

It seems error on builtin.unrealized_conversion_cast i64 to index. And gpu.launch operation seems not lower to nvvm ir. Are there some wrongs in my lower pass pipeline or mlir func? Thank you for your help !

You are at least missing the gpu-kernel-outlining pass here to turn gpu.launch into gpu.launc_func.

Try maybe --test-lower-to-nvvm on your example?

Thanks, I got it ! I also want to know if there is a way to lower to nvvm more general, such as general pass in commandline, rather than test-lower-to-nvvm. Is this part of the commandline pass in development?

test-lower-to-nvvm is not a pass, it is an “example pipeline” that you can take inspiration on by looking at all the passes it invokes.

1 Like

OK, Thank you very much.