Recently I’m working with the gpu
Dialect to fully take advantages of the capablility of NVIDIA GPU multi-thread. But I met some problems when I tried to use async
attribute for some ops.
When I just write the following code in test.mlir
:
module attributes {gpu.container_module} {
func.func @main() {
%c2 = arith.constant 2 : index
%0 = gpu.wait async
%1, %2 = gpu.alloc async [%0] (%c2) : memref<?xf32>
%5, %6 = gpu.alloc async [%0] (%c2) : memref<?xf32>
%3 = gpu.dealloc async [%2] %1 : memref<?xf32>
%4 = gpu.dealloc async [%6] %5 : memref<?xf32>
gpu.wait [%3]
return
}
}
and lower it with the following pipeline:
mlir-opt test.mlir -llvm-request-c-wrappers | \
mlir-opt -gpu-to-llvm | \
mlir-opt -reconcile-unrealized-casts
it can output the llvm ir and work well with my C codes(-llvm-request-c-wrappers
).
However, when i try to add the gpu.memcpy
op into my code:
module attributes {gpu.container_module} {
func.func @main() {
%c2 = arith.constant 2 : index
%0 = gpu.wait async
%1, %2 = gpu.alloc async [%0] (%c2) : memref<?xf32>
%5, %6 = gpu.alloc async [%0] (%c2) : memref<?xf32>
%7 = gpu.memcpy async [%2, %6] %1, %5 : memref<?xf32>, memref<?xf32>
%3 = gpu.dealloc async [%2] %1 : memref<?xf32>
%4 = gpu.dealloc async [%6] %5 : memref<?xf32>
gpu.wait [%3]
return
}
}
and using the same pipeline, it prompts an error:
<stdin>:7:10: error: failed to legalize operation 'gpu.memcpy' that was explicitly marked illegal
%1 = gpu.memcpy async [%asyncToken, %asyncToken_1] %memref, %memref_0 : memref<?xf32>, memref<?xf32>
^
<stdin>:7:10: note: see current operation: %34 = "gpu.memcpy"(%19#1, %33#1, %19#0, %33#0) : (!gpu.async.token, !gpu.async.token, memref<?xf32>, memref<?xf32>) -> !gpu.async.token
module {
}
I had tried out other pipelines but eventually failed. So what’s the right pipeline to lower it? I also want to add gpu.launch_func
op into my code, so what’s the right pipeline for lowering mlir code containing gpu.wait
, gpu.alloc
, gpu.memcpy
and gpu.launch_func
?
(the LLVM version I use is llvmorg-16.0.6
, and CUDA version is 11.8
with NVIDIA GeForce RTX 2060 SUPER)