I try to compile the CUDA code with dynamic parallelism
__global__ void kernel_parent(int *a, int n, int N){
cudaStream_t s1, s2;
int tid = blockIdx.x * blockDim.x + threadIdx.x;
dim3 block(BLOCKSIZE, 1, 1);
dim3 grid( (n + BLOCKSIZE - 1)/BLOCKSIZE, 1, 1);
kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);
}
When I compile the code, I got the following error:
dynamic_parallelism.cu:21:9: error: reference to __global__ function 'kernel_simple' in __global__ function
kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);
I used the command as follow:
clang++ -std=c++11 dynamic_parallelism.cu \
-fgpu-rdc \
--cuda-path=$CUDA_PATH \
--cuda-gpu-arch=sm_61 \
-L$CUDA_PATH/lib64 \
-lcudart_static -ldl -lrt
Does this mean the current Clang/LLVM do not support this CUDA feature?
Thanks in advance