Hi!
I'm looking into ways how to port our application to machines using AMD GPUs.
The way the application is currently set up for accelerated computing is the following:
1) LLVM IR modules are being built in memory during runtime
2) In case of NVIDIA GPUs each module is compiled with the NVPTX backend and the assembly file (PTX) extracted
3) The PTX is loaded with the NVIDIA kernel driver which JIT-compiles it to the actual GPU installed, and the kernel is launched on the GPU for execution.
Now, AMD doesn't seem to use an intermediate IR level comparable to PTX. As far as I understand the AMDGPU backend generates binary code (AMGCN) for the GPU kernels directly. This makes me wonder if there is any way to execute (launch) such a kernel after it has been compiled by the AMDGPU backend.
It will certainly not work in the typical HIP way using the ROCM utilities where a kernel is specified with the __global__ attribute like
__global__ void kernel(const T* in, T* out) {}
From the AMDGPU kernel's compilation at most a raw pointer is returned (and that is only in the case if the backend supports JIT, which it probably doesn't!?) otherwise it produces a library. Is it possible to 'dlload' such a (kernel) library into the address space and launch it?
Anyone has any idea how to launch such a kernel within the same program context/execution?
Thanks,
Frank