Clang OpenMP offload unified shared memory

I’m trying to use the unified shared memory feature of the openmp offload implementation. My question is, do I need to declare the directive once in the program, or do I have to declare it every time before the target pragma. Also, are there any performance degradations if its used before each pragma.

This seems to be a question about the OpenMP specification rather than the LLVM implementation. It would therefore be better either to

  1. read the specification itself (available on the OpenMP site
  2. look at a tutorial
  3. ask on Stack overflow(trading the question with an OpenMP tag)

Ah sorry, it worked with nvhpc but the performance on clang was a bit off. I am not able to profile the clang generated code with nvprof so I was wondering if redeclaration was the issue.

I have checked the spec before, but it had no mention on this. Maybe nvhpc’s implementation is a bit different.

Which directive? Requires unified_shared_memory?
Assuming that one, then:

No, once per TU, before all target pragmas.


Hard to debug w/o information on the program and how you compile it.
FWIW, we do not optimize USM much. That said, I would recommend to check generally if the OpenMP is properly optimized before I’d assume it’s USM. See OpenMP Optimization Remarks — LLVM/OpenMP 18.0.0git documentation

Thank you, yes USM was not the issue, same issue happens on nvhpc as well. After further profiling, I found out that excess data transfer brings the performance down. Prime suspect being pinned host memory. Though I have no idea how to allocate pinned host memory (zero copy) with openmp offload.