Runtime issues when using cufftXtSetCallback

Hello,

I’m trying to compile/run/compare cuFFT CUDA code w/ both NVCC and LLVM. Specifically, CuFFT with callbacks.

Following the example here: cuda-sample/simpleCUFFT_callback.cu at 05555eef0d49ebdde999f5430f185a225ef00dcd · zchee/cuda-sample · GitHub

Also see: cuFFT

Two caveats are that the system has to be Linux64 and the code must be statically linked against the cuFFT library. I think I have all of that.

I have been able to successfully compile and run the callback sample with NVCC. I have been able to successfully compile but not run the callback sample with LLVM Clang.

When I execute the binary w/ LLVM Clang, I get a CUDA Error: 718 cudaErrorInvalidPc. I can simply comment out the two calls to the cufftXtSetCallback and then the code runs fine w/ LLVM Clang and no errors. The same code runs fine with NVCC.

I’ve tried multiple versions of LLVM (8, 13, 15) with CUDA 11.4 and a GCC 11.2 toolchain.

Are CuFFT callbacks not supported w/ LLVM CUDA? Or does something unintended happening with static linking?

Let me know if you need more information.

Thanks!

Actually I think I may have solved it for future users looking to do the same.

The key component I was missing was compilation into relocatable device code, which is also a necessary step in statically linking cufft_static, see: https://developer.nvidia.com/blog/cuda-pro-tip-use-cufft-callbacks-custom-data-processing

In my nvcc example I was using the -dc flag, and after I added the appropriate flag for LLVM (-fgpu-rdc) I was able to get my example to run with CuFFT callbacks and LLVM.

Note that linking can get a bit more complicated w/ device code. I am trying both NVCC (with -dlink flag) and LLVM (see CUDA: Clang separable compilation (!5221) · Merge requests · CMake / CMake · GitLab for example of how Cmake does it with LLVM).

Yup. cuFFT is a bit unusual in this regard. It may interesting to check whether compilation/linking using --offload-new-driver would work out of the box with it.