Making OpenMP declare target static global variables externally visible

Recently there were some changes added in Clang to disallow target update for variables that are not externally visible. This includes any static global variables that are marked declare target. For example the following code is not allowed.

— foo.c —

#include<stdio.h>
#pragma omp declare target
static int aaa=1;
#pragma omp end declare target

void foo() {
   #pragma omp target update from (aaa)
   printf("aaa in foo: %d \n", aaa );
}

We think this case should be supported and that can be done by making the declare target static global variables externally visible. That, however, causes a symbol redefinition error if two variables with the same name exist in two different files. For example, if variable “aaa” also exists in boo.c.

–boo.c–

#include<stdio.h>

#pragma omp declare target
static int aaa = 2; 
#pragma omp end declare target

void boo() {
   #pragma omp target update from (aaa)
   printf("aaa in boo: %d \n", aaa );
}

In order to support this, we propose the following changes:

  • Mangle declare target global variable names on the device side. This will prevent any potential name conflicts.
  • On the host side, use the mangle name in the Offload entry table. This will allow the OpenMP host runtime to find the symbol.

Any comments/suggestions are welcome.

The changes I made turned this into an explicit rather than implicit failure as we didn’t previoulys support this, see the original issue. This occurs, as you said, because we cannot access these variables normally from the host. Variables with internal linkage will not have a symbol that we can read from the host, nor will variables with hidden visibility if the target image is a shared library. Personally, I think it’s perfectly reasonable to tell the user in this case to either remove the static keyword, or override the visibility for the offending variable. Supporting this means changing the normally understood semantics of these language features, which is something I was hoping to avoid as I started moving offloading semantics closer to the host’s. The OpenMP standard doesn’t explicitly allow or disallow this so it’s open to interpretation, but at the end of the day it’s the users that really dictate this stuff. There is precedent to do this from other languages and compilers. CUDA and the NVHPC compilers will do some sort of name mangling for this.

There are a couple problems with simply mangling the names though, and it’s probably impossible to get it truly unique as I’ve discovered through conversations with others.

  • If we generate a true random number, it would need to be used for both the host and device. This means each compilation job has some external dependency so we can’t run the host and device compilations independently. This also punts the problem to statistically impossible instead of actually impossible
  • If we use the file’s unique ID, then we could uniquely identify it on that filesystem, however this won’t work if the user compiles the same file multiple times.
  • If we use the file’s unique ID and add in some extra values from the compilation, this still would fail in the case of a non-static source tree like in the case of distributed compilation (e.g. mv /a/foo.c host1:/tmp/foo.c).

These are pretty extreme edge cases, so all the solutions I’ve seen from other compilers simply use the filename or path and ignore all the possible failure modes. Personally I’m not a fan of this sort of “good enough” implementation when the alternative is just not supporting it, but it should solve the majority of cases.

Clang’s CUDA support has some facilities to do this that we could inherit to support this. I would be begrudgingly fine implementing this if we needed to. Primarily because we can see all the uses within the TU and only do this name mangling if there’s any update clauses attached to it. That way we get to still use static variables on the device when creating device libraries.

However, I am staunchly against doing this for hidden visibility. If we decay hidden visibility to protected or default it would prevent us from being able to use any hidden variables (without making an exception for nohost regions). This is because in the following case, we can’t know if someone outside the TU will try to update that variable form the host. The hidden visibility means that if we’re targeting AMDGPU, which uses shared libraries for their images, we won’t be able to see the symbol to update it.

#pragma omp declare target
int __attribute__((visibility("hidden"))) aaa=1;
#pragma omp end declare target

Even though the NVIDIA toolchain already demotes hidden visibility to default visibility, we respect it completely in LLVM-IR and use it when performing LTO. If a variable has hidden visibility, that means we can eliminate it entirely from the module because there will be no external readers. We use hidden visibility in the OpenMP device runtime for this reason. Otherwise, all the protected or default globals would need to be exported when performing LTO, which is not ideal.


For the TL;DR, I think that externalizing static variables is a bad idea that shouldn’t have been introduced. However, there is precedent with other compilers and languages doing this and users may need it, so we might need to support it anyway. Name mangling these is non-trivial, but in the static case I’m okay with it only because we can ignore it unless the user actually updates it. I’m against making hidden variables visible because it prevents us from optimizing some cases.

Let me know what your thoughts are on this, I may have missed some things.

I agree that such things get dictated by the users. I think from the users point of view, they aren’t really accessing the static variables from a different TU and hence are expecting it to work, which I think is a reasonable expectation. We have seen such usage in our customers code and also noticed that OpenMP SPEC ACCEL code does target update on static globals.

You raised some interesting points regarding name mangling. Isn’t there already a precedence for something like this with kernel names? Would this be any different than that? In fact, this may be a much narrower case. We can do as you suggested by name mangling only if there is any target update. We may also need to do it for map with always clause and for cases where the address escapes. For example, if the function returns an address of a static global variable. Would this be reasonable?

We are okay with not changing the behavior for symbols with hidden visibility attribute. This way the users will have an option to explicitly keep the symbols hidden.

If people are using it, then we’ll need to support it. There is some support to do this for the CUDA side, but a perfect solution probably doesn’t exist. In my mind, we should generate something unique to the file and environment implicitly and use that. Then, if we still get a failure we should provide an option for the user to specify something else to make it work. I’m trying to get something like that landed for the CUDA case, then we should be able to re-use it here if so.

So this will require us to detect variables used in a target update and mark them for externalization when we generate the entries. This will at least let this be an opt-in so these static variables can still be optimized out in the majority of cases.

Some people have suggested using a separate reference variable as well, but I’m not sure if that will solve the problem since it will still keep us from optimizing these things out if they are externally visible, see Compiler Explorer.