Does anyone know whether there is yet support for math functions in AMD GPU kernels?
In the NVIDIA world they provide the libdevice IR module which can be linked to an existing module containing the kernel. In other words they provide all math functions on IR level. NVIDIA even claims that libdevice is actually device specific (compute capability).
I was wondering how that is done on the AMD side of things.
Brian, this seems like a good question for you.
There certainly is support; after all AMD supports both OpenCL and HIP (a dialect of C++ very close to cuda).
AMD device libraries (in bitcode form) are installed when ROCm ( https://rocm.github.io/ ) is installed.
AMD device libraries are mostly written in (OpenCL) C and open source at GitHub - RadeonOpenCompute/ROCm-Device-Libs: ROCm Device Libraries . They are configured by linking in a number tiny libraries that define global constants; these allow unwanted code including branches to be eliminated during post-bitcode-link optimization.
Thank! So, support for the math functions seems to be there. That's good new.
This brings me to the 2nd point that I need to figure out in order for our application to have a chance to run on AMD GPUs.
The thing I am looking for (and could not find out so far) is what AMD's equivalent would be to NVIDIA's driver interface. I am speaking -lcuda as opposed to the runtime -lrtcuda. Our applications loads dynamically (!) an GPU ISA kernel and launches it. In the NVIDIA world there's a function called "cuModuleLoadData" that allows to load a kernel in PTX and returns a CUfunction. From what I have seen so far on the AMD side it looks like as all compilers target GPU ISA directly. Namely the HCC and the AMDGPU backend. Which wouldn't be a problem as long as those generated kernels can be dynamically loaded afterwards.
Is there some library for AMD similar to the NVIDIA driver interface that let's the user load an external kernel, say a kernel that was compiled by the AMDGPU backend?
If you want to do cuda-like programming or use cuda-like facilities on AMD, then I'd recommend familiarizing yourself with HIP. The code, including tests and examples, and documentation is at https://github.com/ROCm-Developer-Tools/HIP . There's also a porting guide and pointers to other documentation here: HIP Porting Guide — ROCm 4.5.0 documentation