[MLIR][GPU] calling `gpu.func` that is not a kernel (a.k.a. device function)

akroviak · May 31, 2024, 9:33am

The GPU dialect says:

GPU functions are either kernels (as indicated by the kernel attribute) or regular functions. The former can be launched from the host side, while the latter are device side only.

However, it does not show examples of calling non-kernel gpu.func and I fail to run the following:

module attributes {gpu.container_module} {

gpu.module @kernels {
    gpu.func @simple1(
        %r_size : index,
        %in_a : memref<?xi64>,
        %in_b : memref<?xi64>,
        %in_c : memref<?xi64>
    ) {
        %ci0    = arith.constant 0 : index
        %ci1    = arith.constant 1 : index
        scf.for %idx0 = %ci0 to %r_size step %ci1 {
            %idx0_i64 = arith.index_cast %idx0 : index to i64
            memref.store %idx0_i64, %in_a[%idx0] : memref<?xi64>
        }
        gpu.return
    }
    gpu.func @simple(
        %r_size : index,
        %in_a : memref<?xi64>,
        %in_b : memref<?xi64>,
        %in_c : memref<?xi64>
    ) kernel  {
        call @simple1(%r_size, %in_a, %in_b, %in_c)
        gpu.return
    }
}

func.func @main() -> i64 {
    // Constants
    %ci1    = arith.constant 1 : index
    %c0     = arith.constant 0 : i64
    %size   = arith.constant 100 : index

    %a = memref.alloc(%size) : memref<?xi64>
    %b = memref.alloc(%size) : memref<?xi64>
    %c = memref.alloc(%size) : memref<?xi64>

    %a_unranked = memref.cast %a : memref<?xi64> to memref<*xi64>
    %b_unranked = memref.cast %b : memref<?xi64> to memref<*xi64>
    %c_unranked = memref.cast %c : memref<?xi64> to memref<*xi64>
    gpu.host_register %a_unranked : memref<*xi64>
    gpu.host_register %b_unranked : memref<*xi64>
    gpu.host_register %c_unranked : memref<*xi64>

    %tmp_a = gpu.alloc(%size) : memref<?xi64>
    %tmp_b = gpu.alloc(%size) : memref<?xi64>
    %tmp_c = gpu.alloc(%size) : memref<?xi64>

    %token_a = gpu.memcpy async %tmp_a, %a : memref<?xi64>, memref<?xi64>
    %token_b = gpu.memcpy async [%token_a] %tmp_b, %b : memref<?xi64>, memref<?xi64>
    %token_c = gpu.memcpy async [%token_b] %tmp_c, %c : memref<?xi64>, memref<?xi64>

    %token_d = gpu.launch_func async [%token_c] @kernels::@simple blocks in (%ci1, %ci1, %ci1) threads in (%ci1, %ci1, %ci1)
        args(%size : index, %tmp_a : memref<?xi64>,%tmp_b : memref<?xi64>, %tmp_c : memref<?xi64>)

    %token_e = gpu.memcpy async %a, %tmp_a : memref<?xi64>, memref<?xi64>

    call @printMemrefI64(%a_unranked) : (memref<*xi64>) -> ()
    return %c0 : i64
}
func.func private @printI64(i64)
func.func private @printMemrefI64(memref<*xi64>)

} // END gpu.container_module

Hence the question, what is the correct way to call a “device function” in gpu dialect from within a kernel?

asiemien · May 31, 2024, 10:24am

AFAIK, you have to use normal func.func within your gpu.module for device functions and func.call them from within your kernel function as shown in this GPU test.

akroviak · May 31, 2024, 2:29pm

Thanks, using func.func for device function and calling it via func.call indeed works.
After the GpuKernelOutlining pass, device functions remain as func.func inside the gpu.module. ConvertGpuOpsToNVVMOps converts functions inside gpu.module to llvm.func.
I am wondering at which stage is gpu.func without the kernel attribute used?

Topic		Replies	Views
[Func][GPU] Device function reference (`func.constant`) MLIR gpu	1	87	September 9, 2024
Vectors and function calls across modules LLVM Dev List Archives	0	97	January 4, 2008
How to use GPU functions? MLIR gpu	2	542	May 20, 2022
Changes to the PTX calling conventions LLVM Dev List Archives	12	91	December 14, 2011
LLVM-C: Calling functions contained in other libraries LLVM Dev List Archives	3	130	August 12, 2010

[MLIR][GPU] calling `gpu.func` that is not a kernel (a.k.a. device function)

Related topics