Too much shared data with -g flag to compile offload based program

Here’s the full error:

clang++-15 -DENABLE_OMP -DENABLE_CUDA -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 --offload-arch=sm_75 -I/usr/include/mpich-x86_64 -Wall -Wextra -fPIE -std=c++17 -DENABLE_SENSEI \
-g -I/work/SENSEI/sensei-svtk-install//include `/work/SENSEI/sensei-svtk-install/bin/sensei_config --cflags` -I./ \
newton.cpp domain_decomp.o initialize_file.o patch_data.o communication.o initialize_random.o patch.o patch_force.o solver.o write_vtk.o command_line.o sensei_adaptor.o insitu.o \
stream_compact.a /usr/local/cuda-12.0/lib64//libcudart_static.a \
`/work/SENSEI/sensei-svtk-install/bin/sensei_config --libs` -Wl,-rpath=`/work/SENSEI/sensei-svtk-install/bin/sensei_config --python-dir` /work/SENSEI/sensei-svtk-install//lib64/libhamr.a -lm -lstdc++ -L/usr/lib64/mpich/lib -Wl,-rpath -Wl,/usr/lib64/mpich/lib -Wl,--enable-new-dtags -lmpi  \
-o newtonpp_clang15_omp
nvlink error   : Entry function '__omp_offloading_801_4ae29c0__ZN4hamr16openmp_allocatorIdvE8allocateIdEESt10shared_ptrIdEmPKT__l375' uses too much shared data (0xcf9c bytes, 0xc000 max)
nvlink error   : Entry function '__omp_offloading_801_4ac45f2__ZN4hamr16openmp_allocatorIhvE8allocateEmRKh_l344' uses too much shared data (0xcf9c bytes, 0xc000 max)
/bin/clang-linker-wrapper: error: 'nvlink' failed
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)

This only happens when I use -g for a debug build, but the program compiles and runs with-O3. Any help on how to get a debug build going? I don’t need to step into the offload kernels, just the host code.

Generally GPU builds have problems when run without optimizations because the resources are so scarce. It makes debugging difficult in cases like this.

I think the following might work in your case

clang test.c -fopenmp --offload-arch=native -O3 -Xarch_host -g -Xarch_host -O0
1 Like

Addendum, I think the features I used above that allow you to do this are only present in Clang 16 and onwards.

this worked on clang15 as well. thank you!!

edit: to be clear adding -Xarch_host -g -Xarch_host -O0 keeping my other flags the same worked.

1 Like

Surprised that works, I only fixed -Xarch_host on OpenMP this January. But if it’s working then there’s no problem.