Does clang support CUDA's dynamic parallelism feature?

I try to compile the CUDA code with dynamic parallelism

__global__ void kernel_parent(int *a, int n, int N){
    cudaStream_t s1, s2;
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    dim3 block(BLOCKSIZE, 1, 1);
    dim3 grid( (n + BLOCKSIZE - 1)/BLOCKSIZE, 1, 1);
    kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);

When I compile the code, I got the following error: error: reference to __global__ function 'kernel_simple' in __global__ function
        kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);

I used the command as follow:

clang++ -std=c++11 \
    -fgpu-rdc \
    --cuda-path=$CUDA_PATH   \
    --cuda-gpu-arch=sm_61 \
    -L$CUDA_PATH/lib64   \
    -lcudart_static -ldl -lrt

Does this mean the current Clang/LLVM do not support this CUDA feature?
Thanks in advance

Dynamic parallelism is not implemented in clang.

I assume that the code we’d need to generate for kernel launches on the GPU side is not all that different from what we already do on the host, so making it work should not be particularly hard. That said, I’ve seen virtually no demand for this feature, so it’s not a high priority.

Got it.
Thanks for the answer!

Setup: clang-14, using the above commands by @drcut to compile the program.
I am new to clang, and trying to get a hang of it. I wanted to compile some CUDA programs that have dynamic parallelism so that I can do changes using LLVM IR.

In case of dynamic parallelism, the “<<<>>>” gets lowered to few API calls, cudaGetParameterBuffer and cudaLaunchDevice.
cudaLaunchDevice takes a function pointer of a global function (kernel_simple in above example).
I manually converted the <<<>>> syntax to the above API calls, but got the same error — clang does not allow even referring the pointers.
Say, if this error was disabled, is the NVPTX backend capable of generating valid PTX code?

Can you provide pointers where I can start taking a look (at source code), to see what is present to support hose-side launches?

Any help is appreciated.

The code responsible for handling host-side kernel launches lives in llvm-project/CGCUDANV.cpp at main · llvm/llvm-project · GitHub

It would need to be extended to work on the GPU side.

1 Like