[OPENMP] USM pragma, more than a safety net, its an operational mode.

OpenMP 5 has a “requires unified_shared_memory” (USM) pragma. The c and C++ syntax is “#pragma omp requires unified_shared_memory”. At a minimum, this USM pragma tells the compiler that the intended offloading target must be capable of unified shared memory. If the target does not have this capability, the compilation and/or runtime should gracefully fail. It is required that all source compilations in an application have the USM pragma or none have it. Unified shared memory makes map clauses optional. So if your program does not have a complete set of map clauses, this safety net is important.

What is the difference with runtime operation in USM mode? How should map clauses be handled? : - ignored? , - implement the same copy semantics as if on a discrete memory GPU? - optimize memory management? In OpenMP these decisions are up to the implementation. In LLVM, USM is more than just a compilation and runtime safety net. USM defines a compilation and runtime-aware mode of operation. In default mode, the map-derived GPU copy semantics are executed even if the GPU supports USM. In USM mode, the copy semantics are not executed.

Should programmers delete or avoid the use of map clauses for a USM application? Absolutely NOT! There are two reasons to continue use of map clauses in USM mode: performance and portability.

From a performance perspective the map clauses may trigger more than the implied copy semantics. They provide important information to the compiler and runtime regarding if and how variables are accessed in the target region (on the GPU). This information allows the compiler or runtime to allocate and/or manage device memory in an optimal fashion.

There are two portability motivations for map clauses. The first is that the application is portable to accelerators/GPUs that do not provide unified shared memory. The 2nd is the ability to build and run your application in default mode on the same USM-capable GPU.

GPU page migration is not necessary for correct OpenMP applications in default mode. Some GPUs may provide optimizations when GPU page migration is disabled. Furthermore, the runtime copy semantics of OpenMP map clauses may be more efficient than automatic page migration in USM mode. This could include directed prefetch constructs that may be more difficult to implement with the USM page subsystem.

Summary: In default OpenMP mode, host consistency is established at the boundaries of target regions via the copy semantics implied by map clauses. In USM mode, host consistency is established with page migration which could be less efficient. A colleague reminded me of an old adage that I like to repeat; “ In HPC, all paging is bad paging”. So for performance and portability, I strongly recommend OpenMP programmers continue to use map clauses and test their applications in default mode (without the USM pragma).

Greg Rodgers

Opinions are my own, not my employer.