Troubles with offloading in Clang 6.0 and trunk

Hi all,

What’s the best Linux distribution (or other build environment) to test the Clang/OpenMP offloading support? I’ve run into enough issues trying to get it working that I could probably file a half dozen bug reports, but I want to make sure I’m not doing something incredibly stupid before I go down that path…

I’ve tried both -fopenmp-targets=nvptx64-nvidia-cuda and -fopenmp-targets=x86_64-pc-linux-gnu, and end up hitting similar problems with both.

I’ve been using my own LLVM builds and can’t seem to make it cooperate. First off, -fopenmp-targets= doesn’t like seeing ar archives on the link line for some reason: clang-offload-bundler generates 0-byte .o files for the offload target types when pointed at “.a” files. So I have to point it to the object files that would have been contained inside the static libs. Doing that will at least make it compile and link, but that’s still not using the target offload support. I haven’t yet used offloading anywhere, but from what I understand, I need to wrap a bunch of places with “#pragma omp declare target” and add “#pragma omp target” to some “#pragma omp parallel” blocks, right?

Adding “#pragma omp target” causes other problems. With LLVM 6.0 (plus some cherry-picked commits[1] for CUDA-9.1 support and other stuff) it has symbol-resolution issues at link time (I’m not sure where these symbols are supposed to come from, either):

nvlink warning : Function ‘__omp_outlined___wrapper’ has address taken but no possible call to it
nvlink error : Undefined reference to ‘__kmpc_kernel_init’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_dispatch_init_8u’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_dispatch_next_8u’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘bodyBodyInteraction’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_global_thread_num’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_kernel_prepare_parallel’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_kernel_deinit’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_kernel_parallel’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink error : Undefined reference to ‘__kmpc_kernel_end_parallel’ in ‘/tmp/nbody_CPU_SOA-d2d669.cubin’
nvlink info : 192 bytes gmem
nvlink info : Function properties for ‘__omp_offloading_fe00_d0111c__ComputeGravitation_SOA_l63’:
nvlink info : used 94 registers, 112 stack, 0 bytes smem, 360 bytes cmem[0], 0 bytes lmem

The “bodyBodyInteraction” one is probably my fault somehow, though I don’t understand how. It’s a static function defined in a header, but wrapping it with “#pragma omp declare target” doesn’t seem to fix the undefined reference. So I don’t know what’s happening there.

Given the above link problems I attempted using trunk as well, and ended up with a segmentation fault in clang when compiling the .c file with the “#pragma omp target” directive in it:

“/home/steven/.apps/llvm-trunk/bin/clang-6.0” -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name nbody_CPU_SOA.c -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -relaxed-aliasing -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -no-integrated-as -fuse-init-array -mlink-cuda-bitcode /opt/cuda/nvvm/libdevice/libdevice.10.bc -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /home/steven/.apps/llvm-trunk/lib/clang/7.0.0 -dependency-file nbody_CPU_SOA.d -MT nbody_CPU_SOA.o -idirafter /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.1/include -D _GNU_SOURCE -D LIBTIME_STATIC -D USE_OPENMP -D HAVE_SIMD -D USE_LIBC11 -I …/subprojects/c11/include -I …/subprojects/time/include -I/opt/intel/composerxe/linux/ipp/include -I/opt/intel/composerxe/linux/mkl/include -internal-isystem /usr/local/include -internal-isystem /home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -Wall -Wdeclaration-after-statement -Wmissing-declarations -Wmissing-prototypes -Wno-declaration-after-statement -Wno-long-long -Wno-unknown-pragmas -Wold-style-definition -Wstrict-prototypes -std=gnu99 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/steven/Development/nbody/src -ferror-limit 19 -fmessage-length 190 -fopenmp -pthread -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/nbody_CPU_SOA-bcc597.s -x c nbody_CPU_SOA.c -fopenmp-is-device -fopenmp-host-ir-file-path /tmp/nbody_CPU_SOA-eb1725.bc
clang -cc1 version 7.0.0 based upon LLVM 7.0.0svn default target x86_64-unknown-linux-gnu
ignoring nonexistent directory “/include”
ignoring nonexistent directory “/include”
ignoring duplicate directory “/usr/local/include”
ignoring duplicate directory “/home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include”
ignoring duplicate directory “/usr/include”
#include “…” search starts here:
#include <…> search starts here:
…/subprojects/c11/include
…/subprojects/time/include
/opt/intel/composerxe/linux/ipp/include
/opt/intel/composerxe/linux/mkl/include
/usr/local/include
/home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include
/usr/include
/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.1/include
End of search list.
#0 0x00007f5659cae3ba llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x72f3ba)
#1 0x00007f5659cabe36 llvm::sys::RunSignalHandlers() (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x72ce36)
#2 0x00007f5659cac428 SignalHandler(int) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x72d428)
#3 0x00007f565c39db90 __restore_rt (/usr/lib/libpthread.so.0+0x11b90)
#4 0x00007f565a9d9c3c (anonymous namespace)::CVPLatticeFunc::ComputeLatticeVal(llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x145ac3c)
#5 0x00007f565a9dbb2e llvm::SparseSolver<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::LatticeKeyInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > > >::getValueState(llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x145cb2e)
#6 0x00007f565a9dcbfd (anonymous namespace)::CVPLatticeFunc::visitCallSite(llvm::CallSite, llvm::DenseMap<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::DenseMapInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > >, llvm::detail::DenseMapPair<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal> >&, llvm::SparseSolver<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::LatticeKeyInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > > >&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x145dbfd)
#7 0x00007f565a9e138f (anonymous namespace)::CVPLatticeFunc::ComputeInstructionState(llvm::Instruction&, llvm::DenseMap<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::DenseMapInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > >, llvm::detail::DenseMapPair<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal> >&, llvm::SparseSolver<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::LatticeKeyInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > > >&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x146238f)
#8 0x00007f565a9e3584 llvm::SparseSolver<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > >, (anonymous namespace)::CVPLatticeVal, llvm::LatticeKeyInfo<llvm::PointerIntPair<llvm::Value*, 2u, (anonymous namespace)::IPOGrouping, llvm::PointerLikeTypeTraitsllvm::Value*, llvm::PointerIntPairInfo<llvm::Value*, 2u, llvm::PointerLikeTypeTraitsllvm::Value* > > > >::visitInst(llvm::Instruction&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x1464584)
#9 0x00007f565a9e46a3 runCVP(llvm::Module&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x14656a3)
#10 0x00007f5659d9b705 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/steven/.apps/llvm-trunk/bin/…/lib/libLLVM-7.0svn.so+0x81c705)
#11 0x00000000007180f4 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_deletellvm::raw_pwrite_stream >) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x7180f4)
#12 0x000000000071a718 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_deletellvm::raw_pwrite_stream >) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x71a718)
#13 0x0000000000d4961c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0xd4961c)
#14 0x0000000001417b89 clang::ParseAST(clang::Sema&, bool, bool) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x1417b89)
#15 0x0000000000d48377 clang::CodeGenAction::ExecuteAction() (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0xd48377)
#16 0x0000000000b5f9e6 clang::FrontendAction::Execute() (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0xb5f9e6)
#17 0x0000000000b1fc1a clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0xb1fc1a)
#18 0x0000000000c17871 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0xc17871)
#19 0x00000000006b2c18 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x6b2c18)
#20 0x00000000006a07df main (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x6a07df)
#21 0x00007f56588b39a7 __libc_start_main (/usr/lib/libc.so.6+0x219a7)
#22 0x00000000006aff1a _start (/home/steven/.apps/llvm-trunk/bin/clang-6.0+0x6aff1a)
Stack dump:
0. Program arguments: /home/steven/.apps/llvm-trunk/bin/clang-6.0 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name nbody_CPU_SOA.c -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -relaxed-aliasing -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -fno-signed-zeros -mreassociate -freciprocal-math -fno-trapping-math -ffp-contract=fast -ffast-math -ffinite-math-only -no-integrated-as -fuse-init-array -mlink-cuda-bitcode /opt/cuda/nvvm/libdevice/libdevice.10.bc -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /home/steven/.apps/llvm-trunk/lib/clang/7.0.0 -dependency-file nbody_CPU_SOA.d -MT nbody_CPU_SOA.o -idirafter /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.1/include -D _GNU_SOURCE -D LIBTIME_STATIC -D USE_OPENMP -D HAVE_SIMD -D USE_LIBC11 -I …/subprojects/c11/include -I …/subprojects/time/include -I/opt/intel/composerxe/linux/ipp/include -I/opt/intel/composerxe/linux/mkl/include -internal-isystem /usr/local/include -internal-isystem /home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /home/steven/.apps/llvm-trunk/lib/clang/7.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -Wall -Wdeclaration-after-statement -Wmissing-declarations -Wmissing-prototypes -Wno-declaration-after-statement -Wno-long-long -Wno-unknown-pragmas -Wold-style-definition -Wstrict-prototypes -std=gnu99 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/steven/Development/nbody/src -ferror-limit 19 -fmessage-length 190 -fopenmp -pthread -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/nbody_CPU_SOA-bcc597.s -x c nbody_CPU_SOA.c -fopenmp-is-device -fopenmp-host-ir-file-path /tmp/nbody_CPU_SOA-eb1725.bc

  1. parser at end of file
  2. Per-module optimization passes
  3. Running pass ‘Called Value Propagation’ on module ‘nbody_CPU_SOA.c’.
    clang-6.0: error: unable to execute command: Segmentation fault (core dumped)

If someone wants to repro this, you can build my version of n-body from here (branch “clang-openmp-offload-testing”):

https://github.com/tycho/nbody/tree/clang-openmp-offload-testing

I’m building with:

$ make V=1 CC=“clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -v”

  • Steven

[1] https://git.uplinklabs.net/steven/projects/llvm/clang.git/log/?h=release_60

Ah, I see. I tried x86_64-pc-linux-gnu as well under 6.0 and ended up with an undefined reference to bodyBodyInteraction, but it sounds like that could be attributed to the lack of “declare target” codegen. I guess I’ll wait a few releases and try this exercise again!