OpenMP links a libomptarget.so into the host executable which goes looking for offloading toolchains to dlopen.
This is very flexible, but comes at a price:
- PLT call overhead + inlining barrier
- Startup overhead looking for the toolchain
- Failing to dlopen will run the host fallback, which is not necessarily what one wants for debugging
For device toolchains that are available as static libraries, and for deployment of the toolchain to a system with known, fixed hardware (e.g. some hpc clusters), I’d like to be able to statically link everything. Faster and fewer failure modes.
Particularly fun for amdgcn, because we’ll be able to statically link the userspace graphics driver into the host executable as well. No shared libraries necessary.
Involves a change to libomptarget (that I haven’t written yet) to fill in function pointers without dlopen, and some cmake logic/controls to specify which toolchains are dynamic/static/unavailable. Plus building libomptarget.a if that isn’t yet implemented.
Are patches for this nominally welcome?