I am working on an application with OpenMP target offloading to multiple GPUs. The code works as expected in NVC 22.5, GCC 13, and, to some extent, Clang 16. My code works fine in Clang on a machine with four AMD Instinct MI250X GPUs, but on a compute node with four NVIDIA Tesla V100 GPUs, I get SIGSEGV in .omp_outlined… xxx (). I am stuck with debugging here as I do not know what that could mean. Does it mean that some internal OpenMP function segfaults?
Does anybody have a suggestion for what I could try to do to fix my application?
Unfortunately, I cannot share the code I am working on as I do not own the code.
I have Clang installed from commit 710a834c4c822c5c444fc9715785d23959f5c645 on both of the machines.
The crash sounds like in the generated functions for parallel region. It’s quite challenging to figure where is wrong w/o any code information. Did it show any CUDA error? Can you please share the last couple of lines showing the segment fault?
And if you backtrace, what does it show?
It should be able to show you where that parallel region function was called from, which would tell you much more about what is going on.
Can you compile your code with the “-g” flag. That should give you line numbers and function information what a debugger can use to show you where the SEGV happened.
(For now, just add the -g flag, no need to remove optimisation flags…)
Thank you so much! I had already compiled the code with -g, but adding -gdwarf showed me what went wrong. It was indeed some stupid mistake with a macro that did not expand as I thought it would. Now I have no idea why the code worked in all the other compilers.