[GPUCC] link against libdevice

Yuanfeng_Peng1 · July 29, 2016, 1:27pm

Hi,

I was trying to compile scalarProd.cu (from CUDA SDK) with the following command:

clang++ -I…/ -I/usr/local/cuda-7.0/samples/common/inc --cuda-gpu-arch=sm_50 scalarProd.cu

but ended up with the following error:

ptxas fatal : Unresolved extern function ‘__nv_mul24’

Seems to me that libdevice was not automatically linked. I wonder what flags I need to pass to clang to have the code linked against libdevice?

Thanks!
Yuanfeng Peng

Chandler_Carruth · August 1, 2016, 4:24am

Directly CC-ing some folks who may be able to help.

Justin_Lebar · August 1, 2016, 5:04am

Hi, Yuanfeng.

What version of clang are you using? CUDA is only known to work at
tip of head, so you must build clang yourself from source.

I suspect that's your problem, but if building from source doesn't fix
it, please attach the output of compiling with -v.

Regards,
-Justin

Yuanfeng_Peng1 · August 1, 2016, 5:59am

Hi Justin,

Thanks for your response! The clang & llvm I’m using was built from source.

Below is the output of compiling with -v. Any suggestions would be appreciated!

clang version 3.9.0 (trunk 270145) (llvm/trunk 270133)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.9.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda
“/usr/local/bin/clang-3.9” -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S -disable-free -main-file-name scalarProd.cu -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -no-integrated-as -fcuda-is-device -target-cpu sm_50 -v -dwarf-column-info -debugger-tuning=gdb -resource-dir /usr/local/bin/…/lib/clang/3.9.0 -I …/ -I /usr/local/cuda-7.0/samples/common/inc -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8/backward -internal-isystem /usr/local/include -internal-isystem /usr/local/bin/…/lib/clang/3.9.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8/backward -internal-isystem /usr/local/cuda/include -include __clang_cuda_runtime_wrapper.h -fdeprecated-macro -fno-dwarf-directory-asm -fdebug-compilation-dir /mnt/wtf/workspace/cuda/gpu-race-detection/cuda-compressed-conflict-detection/scalarProd -ferror-limit 19 -fmessage-length 144 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -o /tmp/scalarProd-32a530.s -x cuda scalarProd.cu
hooklib.so loading.
clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target x86_64-unknown-linux-gnu
ignoring nonexistent directory “/include”
ignoring duplicate directory “/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8”
ignoring duplicate directory “/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8”
ignoring duplicate directory “/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8”
ignoring duplicate directory “/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8”
ignoring duplicate directory “/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8/backward”
ignoring duplicate directory “/usr/local/include”
ignoring duplicate directory “/usr/local/bin/…/lib/clang/3.9.0/include”
ignoring duplicate directory “/usr/include”
#include “…” search starts here:
#include <…> search starts here:
…
/usr/local/cuda-7.0/samples/common/inc
/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8
/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/x86_64-linux-gnu/c++/4.8
/usr/lib/gcc/x86_64-linux-gnu/4.8/…/…/…/…/include/c++/4.8/backward
/usr/local/include
/usr/local/bin/…/lib/clang/3.9.0/include
/usr/include
/usr/local/cuda/include
End of search list.

“/usr/local/cuda/bin/ptxas” -m64 -O0 --gpu-name sm_50 --output-file /tmp/scalarProd-181f7e.o /tmp/scalarProd-32a530.s
ptxas fatal : Unresolved extern function ‘__nv_mul24’
clang-3.9: error: ptxas command failed with exit code 255 (use -v to see invocation)

Thanks!
Yuanfeng

Justin_Lebar · August 1, 2016, 6:33am

OK, I see the problem. You were right that we weren't picking up libdevice.

CUDA 7.0 only ships with the following libdevice binaries (found
/path/to/cuda/nvvm/libdevice):

libdevice.compute_20.10.bc libdevice.compute_30.10.bc
libdevice.compute_35.10.bc

If you ask for sm_50 with cuda 7.0, clang can't find a matching
libdevice binary, and it will apparently silently give up and try to
continue compiling your program. That's a bug that we should fix.
(If you want the current behavior, you should have to ask clang not to
use libdevice.)

I see that nvcc from cuda 7.0 works (or at least builds without
error). I guess it uses the libdevice for compute_35. We could do
the same thing, although I am not sure how to tell whether that's safe
in general. I'll look into this as well.

Anyway if you build with CUDA 7.5 your problem should go away, because
CUDA 7.5 has a libdevice binary for compute_50. Just pass
--cuda-path=/path/to/cuda-7.5. Alternatively you could continue
building with cuda 7.0 and pass sm_35 as your gpu arch. clang always
embeds ptx in the binaries, so the result should still run on your
sm_50 card (although your machine will have to jit the ptx on
startup).

As a third alternative, you could symlink your
libdevice.compute_35.10.bc to libdevice.compute_50.10.bc, and...maybe
that would work? If you do that, please let me know how it goes, I am
curious.

Thank you very much for the bug report! If you like I'll cc you on
any relevant changes, just create an account at
https://reviews.llvm.org (if necessary; I can't seem to find you) and
let me know your username.

Regards,
-Justin

Yuanfeng_Peng1 · August 1, 2016, 7:38am

Hi Justin,

Thanks for your help! I passed sm_30 as the target gpu arch and the compilation was successful.

I’m also curious about how the symlink solution works so I also tried it :p. The compilation succeeded, but the binary I got crashed with a complaint ’ (8): illegal libdevice function’ .

I would appreciate to be kept posted about relevant changes; my username is yuanfeng.peng .

Thanks again!
Yuanfeng

Mueller-Roemer_Johan · August 1, 2016, 9:06am

According to libdevice User's Guide :: CUDA Toolkit Documentation compute capabilities > 3.7 should use libdevice.compute_30.XX.bc

Artem-B · August 2, 2016, 11:27pm

After r277542 clang should fix the problem:

clang now picks correct libdevice version
clang reports an error if required libdevice library is not found.

See https://reviews.llvm.org/D23037 for details.

–Artem

Topic		Replies	Views
Error while running libtooling based tool on CUDA code Clang Frontend	5	190	June 4, 2019
MLIR GPU libdevice linking support MLIR gpu	12	1056	May 31, 2023
"cannot find libdevice for sm_" LLVM Dev List Archives	1	304	August 25, 2020
CUDA separate compilation LLVM Dev List Archives	3	113	August 29, 2017
cuda .cu compile error LLVM Dev List Archives	1	241	April 12, 2019

[GPUCC] link against libdevice

Related topics