To execute MLIR on GPUs, I compiled the test.mlir
file. The following is part of the IR :
func @test(){
%cst_0 = arith.constant 0.0 : f32
%2 = memref.alloc() {alignment = 16 : i64} : memref<1x6x28x28xf32>
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 6 {
affine.for %arg3 = 0 to 28 {
affine.for %arg4 = 0 to 28 {
··················
memref.store %15, %2[%arg1, %arg2, %arg3, %arg4] : memref<1x6x28x28xf32>
··················
}}}}
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 6 {
affine.for %arg3 = 0 to 28 {
affine.for %arg4 = 0 to 28 {
··················
%11 = memref.load %2[%arg1, %arg2, %arg3, %arg4] : memref<1x6x28x28xf32>
··················
}}}}
return
}
Using the --convert-affine-for-to-gpu --lower-affine --convert-scf-to-cf --gpu-kernel-outlining --pass-pipeline="gpu.module(strip-debuginfo, convert-gpu-to-nvvm, gpu-to-cubin)" -gpu-to-llvm
, I compiled the test.mlir
down to a binary and executed. The error occurs:
'cuStreamSynchronize(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuStreamDestroy(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuModuleGetFunction(&function, module, name)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuStreamCreate(&stream, CU_STREAM_NON_BLOCKING)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuStreamSynchronize(stream)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuStreamDestroy(stream)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_INVALID_HANDLE'
I wonder if the %2 = memref.alloc()
operation causes the CUDA_ERROR_ILLEGAL_ADDRESS
, which is the host code. When compiled to GPU beckend , the memory address is not registered to GPU. By the way, how can I allocate GPU memory in mlir dialect ?