Target construct not offloading to GPU

Hello,

I've been trying to compile a program (source code is attached) that offloads to a NVIDIA V-100 GPU with LLVM 7.0 and clang 7.0.

It seems that the program is successfully compiled, yet nvprof reports that "no kernels were profiled".
The application seems that is running on the CPU (as "top" command reports a high usage of CPUs).

Compilation line that I used:
clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp -fopenmp-targets="nvptx64-nvidia-cuda"

Output after executing the binary:
==74802== NVPROF is profiling process 74802, command: ./openmp_offload 10 10 10000 1
Number of processors: 160
Number of devices: 4
Default device: 0
Is initial device: 1
==74802== Profiling application: ./openmp_offload 10 10 10000 1
==74802== Profiling result:
No kernels were profiled.
Type Time(%) Time Calls Avg Min Max Name
API calls: 99.99% 311.50ms 1 311.50ms 311.50ms 311.50ms cuCtxCreate
0.00% 11.462us 4 2.8650us 1.1450us 6.2010us cuDeviceGetPCIBusId
0.00% 5.4850us 5 1.0970us 387ns 3.7770us cuDeviceGet
0.00% 4.8070us 12 400ns 232ns 1.0350us cuDeviceGetAttribute
0.00% 1.4360us 3 478ns 384ns 640ns cuDeviceGetCount

When compiled with GCC, the application does the offloading to the GPU.

clang information:
$ clang -v
Version 6
Version >= 90 selected
libdevice.10.bc exists
clang version 7.0.0 (tags/RELEASE_700/final)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0/bin
Found candidate GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Selected GCC installation: /home/user/pkg/gcc/8.2.0/lib/gcc/powerpc64le-unknown-linux-gnu/8.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-9.2, version 9.2

Hopefully somebody has an idea on what's going on here.
If you need any more information to find the issue, let me know.
Thank you.

Best,
-Cristobal

http://bsc.es/disclaimer

openmp_offload.c (1.72 KB)

Hi,

how did you build your compiler? If you didn't specify CLANG_OPENMP_NVPTX_DEFAULT_ARCH Clang will default to sm_35 which doesn't run on Volta (sm_70).
Can you post the output of

clang -v -o openmp_offload openmp_offload.c -O3 -fopenmp=libomp
-fopenmp-targets="nvptx64-nvidia-cuda"

If it's indeed compiling for sm_35, can you try adding -Xopenmp-target -march=sm_70?

Regards,
Jonas

I compiled clang with the following line:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++ -DGCC_INSTALL_PREFIX=${HOST_GCC} -DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64 -Wl,-rpath,${HOST_GCC}/lib64" -DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0 -DGCC_INSTALL_PREFIX=${HOST_GCC}

Indeed, output with verbose confirms that clang is trying to compile with march=sm_35 (output is attached).
Also, trying to compile the program with
"-Xopenmp-target -march=sm_70"
fails with
"clang-7: error: nvlink command failed with exit code 255 (use -v to see invocation)" because of several undefined references (details in the attached file).

So, I'm trying to re-compile clang with CLANG_OPENMP_NVPTX_DEFAULT_ARCH but, still, clang is not generating the library 'libomptarget-nvptx-sm_70.bc'. Therefore, compilation doesn't complete.
Where should this library be? I have one bc file in "clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20 (libomptarget-nvptx-sm_20.bc).

This is how I'm trying to compile clang:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc -DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++ -DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70
Yet, in the compilation process, clang complains about the missing library for sm_70.

Do I need to pass some flag to LLVM too?

Best,
-Cristobal

clang_verbose.txt (9.78 KB)

Yes, now you need to pass LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60,70 (or whatever you like to have) to get the runtime libraries.

I compiled clang with the following line:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
-DGCC_INSTALL_PREFIX=${HOST_GCC}
-DCMAKE_CXX_LINK_FLAGS="-L${HOST_GCC}/lib64
-Wl,-rpath,${HOST_GCC}/lib64"
-DCMAKE_INSTALL_PREFIX=/gpfs/projects/bsc18/bsc18833/pkg/clang/7.0.0
-DGCC_INSTALL_PREFIX=${HOST_GCC}

Indeed, output with verbose confirms that clang is trying to compile
with march=sm_35 (output is attached).
Also, trying to compile the program with
"-Xopenmp-target -march=sm_70"
fails with
"clang-7: error: nvlink command failed with exit code 255 (use -v to
see invocation)" because of several undefined references (details in
the attached file).

So, I'm trying to re-compile clang with
CLANG_OPENMP_NVPTX_DEFAULT_ARCH but, still, clang is not generating
the library 'libomptarget-nvptx-sm_70.bc'. Therefore, compilation
doesn't complete.
Where should this library be? I have one bc file in
"clang_src/test/Driver/Inputs/libomptarget/" but it's for sm_20
(libomptarget-nvptx-sm_20.bc).

That's an empty file for testing. To get Bitcode libraries, you need to compile the OpenMP project using Clang.
(I've started putting together step-by-step instructions how to build LLVM/Clang 7.0 for OpenMP offloading, I'll send a link to the mailing list once ready.)

This is how I'm trying to compile clang:
cmake .. -DCMAKE_C_COMPILER=${HOST_GCC}/bin/gcc
-DCMAKE_CXX_COMPILER=${HOST_GCC}/bin/g++
-DGCC_INSTALL_PREFIX=${HOST_GCC} -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=70

Nit: Should this be -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70?

Both LLVM and clang complains about that flag:
CMake Warning:
Manually-specified variables were not used by the project:

 LIBOMPTARGET\_NVPTX\_COMPUTE\_CAPABILITIES

It seems that developers disabled this flag some time ago (https://github.com/clang-ykt/clang/issues/11).
I haven't found any other way to generate the runtime libraries, though.

Best,
-Cristobal

Cristobal, can you check if you have LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY set instead? In the folder where you built the compiler, there is a CMakeCache.txt file. Can you search for LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY in there. If you see it you should pass 70 to that.
Thanks,

–Doru

I was able to compile the libomptarget with the sm_70 capabilities (LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES) thanks to Jonas Hahnfeld who helped me a lot with this issue.

If the default compiler cannot produce bc files, the compilation process will continue without further warning.
Using the flag “LIBOMPTARGET_NVPTX_ENABLE_BCLIB=ON” forces to generate those libraries (and it fails if it’s not possible to generate them).
In case it’s helpful for somebody, I needed to specify the following flags too:
LIBOMPTARGET_NVPTX_BC_LINKER=llvm-link and LIBOMPTARGET_NVPTX_CUDA_COMPILER=clang

Thanks to everybody

Best,
-Cristobal

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.