Failure lower gpu.alloc with global memory

Hi, I have a host func to execute gpu.alloc and gpu.memcpy, but something wrong when I lower it. Here is my mlir func.

func.func @test(%arg0: memref<1800x1800xf16>, %arg1: memref<16x8xf16>, %arg2: memref<1800x1800xf16>, %arg3: memref<8x16xf16>) {
    %0 = gpu.wait async
    %memref, %asyncToken = gpu.alloc async [%0] () : memref<1800x1800xf16, 1>
    %1 = gpu.memcpy async [%asyncToken] %memref, %arg0 : memref<1800x1800xf16, 1>, memref<1800x1800xf16>
    %memref_0, %asyncToken_1 = gpu.alloc async [%1] () : memref<16x8xf16, 1>
    %2 = gpu.memcpy async [%asyncToken_1] %memref_0, %arg1 : memref<16x8xf16, 1>, memref<16x8xf16>
    %memref_2, %asyncToken_3 = gpu.alloc async [%2] () : memref<1800x1800xf16, 1>
    %3 = gpu.memcpy async [%asyncToken_3] %memref_2, %arg2 : memref<1800x1800xf16, 1>, memref<1800x1800xf16>
    %memref_4, %asyncToken_5 = gpu.alloc async [%3] () : memref<8x16xf16, 1>
    %5 = gpu.memcpy async [%asyncToken_5] %memref_4, %arg3 : memref<8x16xf16, 1>, memref<8x16xf16>
    gpu.wait [%5]
    return
}

And my pass pipeline is this.

mlir-opt ./test.mlir -gpu-lower-to-nvvm="cubin-chip=sm_70 cubin-features=+ptx61 cubin-format=fatbin"

And my error is this.

./test.mlir:3:28: error: 'llvm.insertvalue' op Type mismatch: cannot insert '!llvm.ptr' into '!llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>'
    %memref, %asyncToken = gpu.alloc async [%0] () : memref<1800x1800xf16, 1>
                           ^
./test.mlir:3:28: note: see current operation: %12 = "llvm.insertvalue"(%11, %10) <{position = array<i64: 0>}> : (!llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>, !llvm.ptr) -> !llvm.struct<(ptr<1>, ptr<1>, i64, array<2 x i64>, array<2 x i64>)>

It seems like fail to lower gpu.alloc. Is there something wrong? Thank you for your help.

As a workaround, you could omit the address space as gpu.alloc allocates in global memory by default. That is, replace memref<1800x1800xf16, 1> with memref<1800x1800xf16> etc. which seems to solve the issue for me.

That being said, no clue why the original example fails to lower as the IR looks valid.

Thank you for your help! I have solved this.