I’m trying to use the unified shared memory feature of the openmp offload implementation. My question is, do I need to declare the directive once in the program, or do I have to declare it every time before the target pragma. Also, are there any performance degradations if its used before each pragma.
This seems to be a question about the OpenMP specification rather than the LLVM implementation. It would therefore be better either to
- read the specification itself (available on the OpenMP site https://www.openmp.org/)
- look at a tutorial
- ask on Stack overflow(trading the question with an OpenMP tag)
Ah sorry, it worked with nvhpc but the performance on clang was a bit off. I am not able to profile the clang generated code with nvprof so I was wondering if redeclaration was the issue.
I have checked the spec before, but it had no mention on this. Maybe nvhpc’s implementation is a bit different.
Which directive? Requires unified_shared_memory?
Assuming that one, then:
No, once per TU, before all target pragmas.
Hard to debug w/o information on the program and how you compile it.
FWIW, we do not optimize USM much. That said, I would recommend to check generally if the OpenMP is properly optimized before I’d assume it’s USM. See OpenMP Optimization Remarks — LLVM/OpenMP 18.0.0git documentation
Thank you, yes USM was not the issue, same issue happens on nvhpc as well. After further profiling, I found out that excess data transfer brings the performance down. Prime suspect being pinned host memory. Though I have no idea how to allocate pinned host memory (zero copy) with openmp offload.