Does clang support CUDA's dynamic parallelism feature?

I try to compile the CUDA code with dynamic parallelism

__global__ void kernel_parent(int *a, int n, int N){
    cudaStream_t s1, s2;
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    dim3 block(BLOCKSIZE, 1, 1);
    dim3 grid( (n + BLOCKSIZE - 1)/BLOCKSIZE, 1, 1);
    kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);
}

When I compile the code, I got the following error:

dynamic_parallelism.cu:21:9: error: reference to __global__ function 'kernel_simple' in __global__ function
        kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);

I used the command as follow:

clang++ -std=c++11 dynamic_parallelism.cu \
    -fgpu-rdc \
    --cuda-path=$CUDA_PATH   \
    --cuda-gpu-arch=sm_61 \
    -L$CUDA_PATH/lib64   \
    -lcudart_static -ldl -lrt

Does this mean the current Clang/LLVM do not support this CUDA feature?
Thanks in advance

Dynamic parallelism is not implemented in clang.

I assume that the code we’d need to generate for kernel launches on the GPU side is not all that different from what we already do on the host, so making it work should not be particularly hard. That said, I’ve seen virtually no demand for this feature, so it’s not a high priority.

Got it.
Thanks for the answer!