Failure lower gpu.alloc with global memory

weilinquan · May 26, 2024, 4:12am

Hi, I have a host func to execute gpu.alloc and gpu.memcpy, but something wrong when I lower it. Here is my mlir func.

func.func @test(%arg0: memref<1800x1800xf16>, %arg1: memref<16x8xf16>, %arg2: memref<1800x1800xf16>, %arg3: memref<8x16xf16>) {
    %0 = gpu.wait async
    %memref, %asyncToken = gpu.alloc async [%0] () : memref<1800x1800xf16, 1>
    %1 = gpu.memcpy async [%asyncToken] %memref, %arg0 : memref<1800x1800xf16, 1>, memref<1800x1800xf16>
    %memref_0, %asyncToken_1 = gpu.alloc async [%1] () : memref<16x8xf16, 1>
    %2 = gpu.memcpy async [%asyncToken_1] %memref_0, %arg1 : memref<16x8xf16, 1>, memref<16x8xf16>
    %memref_2, %asyncToken_3 = gpu.alloc async [%2] () : memref<1800x1800xf16, 1>
    %3 = gpu.memcpy async [%asyncToken_3] %memref_2, %arg2 : memref<1800x1800xf16, 1>, memref<1800x1800xf16>
    %memref_4, %asyncToken_5 = gpu.alloc async [%3] () : memref<8x16xf16, 1>
    %5 = gpu.memcpy async [%asyncToken_5] %memref_4, %arg3 : memref<8x16xf16, 1>, memref<8x16xf16>
    gpu.wait [%5]
    return
}

And my pass pipeline is this.

mlir-opt ./test.mlir -gpu-lower-to-nvvm="cubin-chip=sm_70 cubin-features=+ptx61 cubin-format=fatbin"

And my error is this.

./test.mlir:3:28: error: 'llvm.insertvalue' op Type mismatch: cannot insert '!llvm.ptr' into '!llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>'
    %memref, %asyncToken = gpu.alloc async [%0] () : memref<1800x1800xf16, 1>
                           ^
./test.mlir:3:28: note: see current operation: %12 = "llvm.insertvalue"(%11, %10) <{position = array<i64: 0>}> : (!llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>, !llvm.ptr) -> !llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>

It seems like fail to lower gpu.alloc. Is there something wrong? Thank you for your help.

asiemien · May 27, 2024, 10:13am

As a workaround, you could omit the address space as gpu.alloc allocates in global memory by default. That is, replace memref<1800x1800xf16, 1> with memref<1800x1800xf16> etc. which seems to solve the issue for me.

That being said, no clue why the original example fails to lower as the IR looks valid.

weilinquan · May 27, 2024, 11:59am

Thank you for your help! I have solved this.

Topic		Replies	Views
How to lowering gpu.launch correctly MLIR	4	258	December 4, 2023
There is an issue when lowering the GPU dialect MLIR mlir	5	91	March 27, 2025
How to use gpu pointer in Memref? MLIR gpu , mlir	1	172	June 3, 2024
Memref.alloca in AMD GPU kernels seem to lower to llvm.alloca with an incorrect address space MLIR gpu	24	893	January 4, 2023
Making linalg.matmul to GPU runnable code MLIR	6	1388	April 19, 2022

Failure lower gpu.alloc with global memory

Related topics