when is offloading to NVIDIA targets available?

Hi,

today I've built the latest version of llvm-trunk. I used the following
commands.

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm/tools
svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
svn co http://llvm.org/svn/llvm-project/polly/trunk polly
svn co http://llvm.org/svn/llvm-project/lldb/trunk lldb
svn co http://llvm.org/svn/llvm-project/lld/trunk lld
cd clang/tools
svn co http://llvm.org/svn/llvm-project/clang-tools-extra/trunk/ extra
cd ../../../projects
svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt
svn co http://llvm.org/svn/llvm-project/openmp/trunk openmp
svn co https://github.com/clang-ykt/openmp libomptarget
cd ../..

set LLVM_VERSION=llvm-trunk
rm -r build
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local/${LLVM_VERSION} \
   -GNinja \
   -DLLVM_TARGETS_TO_BUILD:STRING="NVPTX;X86" \
   -DCMAKE_BUILD_TYPE:STRING="Release" \
   -DLLVM_PARALLEL_COMPILE_JOBS:STRING="4" \
   -DLLVM_PARALLEL_LINK_JOBS:STRING="4" \
   -DCMAKE_C_COMPILER:STRING="/usr/local/gcc-6.4.0/bin/gcc" \
   -DCMAKE_C_FLAGS:STRING="-m64 -I/usr/local/valgrind/include -I/usr/include/ncurses" \
   -DCMAKE_C_STANDARD_LIBRARIES="-lpthread" \
   -DCMAKE_CXX_COMPILER:STRING="/usr/local/gcc-6.4.0/bin/g++" \
   -DCMAKE_CXX_FLAGS:STRING="-m64 -I/usr/local/valgrind/include -I/usr/include/ncurses" \
   -DCMAKE_CXX_STANDARD_LIBRARIES="-lpthread" \
   -DCMAKE_EXE_LINKER_FLAGS:STRING="-m64" \
   -DLLVM_LIBDIR_SUFFIX:STRING="64" \
   -DLLVM_POLLY_LINK_INTO_TOOLS:BOOL=ON \
-DLIBOMPTARGET_DEP_LIBELF_INCLUDE_DIR:STRING="/usr/local/elfutils-0.169/include" \
-DLIBOMPTARGET_DEP_LIBELF_LIBRARIES:STRING="/usr/local/elfutils-0.169/lib64/libelf.so" \
   -DLIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIR:STRING="/usr/include" \
   -DLIBOMPTARGET_DEP_LIBFFI_LIBRARIES:STRING="/usr/lib64/libffi.so" \
   -DCUDA_INCLUDE_DIRS:STRING="/usr/local/cuda/include" \
   -DCUDA_LIBRARIES:STRING="/usr/local/cuda/lib64/libcudart.so" \
   -DBUILD_SHARED_LIBS:BOOL=ON \
   -DOPENMP_ENABLE_LIBOMPTARGET:BOOL=On \
   ../llvm \
   >& tee log.cmake

Unfortunately I still have the same problems which I reported in Bug 34104
nearly two months ago, if I try to offload to a NVIDIA target. I know that
OPENMP_ENABLE_LIBOMPTARGET isn't enabled by default at the moment.
Nevertheless, I would be grateful if somebody can tell me when offloading
to NVIDIA targets will be available. Does somebody know, why I get a wrong
value for the number of devices if I use the CPU version?

loki introduction 107 clang -fopenmp -fopenmp-targets=x86_64-pc-linux-gnu dot_prod_accelerator_OpenMP.c -lomptarget
loki introduction 108 a.out
Number of processors: 24
Number of devices: 4
Default device: 0
sum = 6.000000e+08

The output is wrong, because I have two six-core processors (24 hwthreads) and one NVIDIA GPU. gcc-7.1.0 reports correct values.

loki introduction 109 gcc -fopenmp dot_prod_accelerator_OpenMP.c loki introduction 110 a.out
Number of processors: 24
Number of devices: 1
Default device: 0
sum = 6.000000e+08
loki introduction 111

Thank you very much for your help in advance.

Kind regards

Siegmar

Hi,

some more comments inline in addition to what George replied. In general, there has been the exact same discussion some weeks ago: http://lists.llvm.org/pipermail/openmp-dev/2017-October/001850.html

[...]

Unfortunately I still have the same problems which I reported in Bug 34104
nearly two months ago, if I try to offload to a NVIDIA target.

First, please use the "OpenMP" product and its component "Clang Compiler Support" for these kind of bug reports. I think that's the place that most of the people monitor...

I know that
OPENMP_ENABLE_LIBOMPTARGET isn't enabled by default at the moment.
Nevertheless, I would be grateful if somebody can tell me when offloading
to NVIDIA targets will be available. Does somebody know, why I get a wrong
value for the number of devices if I use the CPU version?

Depends on what "wrong" means for you. In the view of the compiler and runtime library, they report the "right" values :wink:

loki introduction 107 clang -fopenmp
-fopenmp-targets=x86_64-pc-linux-gnu dot_prod_accelerator_OpenMP.c

This tells the compiler to generate code to offload to the x86 host. Hence, the compiled binary can only deal with that particular target which is the base assumption for the runtime library...

-lomptarget
loki introduction 108 a.out
Number of processors: 24
Number of devices: 4

... and that's why it reports 4 "artificial" devices here. This value is hard-coded and meant for debugging, because apparently you don't gain much from offloading to the host...

Default device: 0
sum = 6.000000e+08

The output is wrong, because I have two six-core processors (24
hwthreads) and one NVIDIA GPU. gcc-7.1.0 reports correct values.

loki introduction 109 gcc -fopenmp dot_prod_accelerator_OpenMP.c
   loki introduction 110 a.out
Number of processors: 24
Number of devices: 1

Based on your general question, I assume that you built GCC to compile for Nvidia GPUs by default? There is a difference here: With Clang, you specify the target when compiling your application, while for GCC you enable a target when you build the compiler.

I hope this answers most of your questions.
Jonas