I try to compile the CUDA code with dynamic parallelism
__global__ void kernel_parent(int *a, int n, int N){
cudaStream_t s1, s2;
int tid = blockIdx.x * blockDim.x + threadIdx.x;
dim3 block(BLOCKSIZE, 1, 1);
dim3 grid( (n + BLOCKSIZE - 1)/BLOCKSIZE, 1, 1);
kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);
}
When I compile the code, I got the following error:
dynamic_parallelism.cu:21:9: error: reference to __global__ function 'kernel_simple' in __global__ function
kernel_simple<<< grid, block, 0, s1 >>> (a, n, N, n);
I assume that the code we’d need to generate for kernel launches on the GPU side is not all that different from what we already do on the host, so making it work should not be particularly hard. That said, I’ve seen virtually no demand for this feature, so it’s not a high priority.
Setup: clang-14, using the above commands by @drcut to compile the program.
I am new to clang, and trying to get a hang of it. I wanted to compile some CUDA programs that have dynamic parallelism so that I can do changes using LLVM IR.
In case of dynamic parallelism, the “<<<>>>” gets lowered to few API calls, cudaGetParameterBuffer and cudaLaunchDevice.
cudaLaunchDevice takes a function pointer of a global function (kernel_simple in above example).
I manually converted the <<<>>> syntax to the above API calls, but got the same error — clang does not allow even referring the pointers.
Say, if this error was disabled, is the NVPTX backend capable of generating valid PTX code?
Can you provide pointers where I can start taking a look (at source code), to see what is present to support hose-side launches?