The changes I made turned this into an explicit rather than implicit failure as we didn’t previoulys support this, see the original issue. This occurs, as you said, because we cannot access these variables normally from the host. Variables with internal linkage will not have a symbol that we can read from the host, nor will variables with hidden visibility if the target image is a shared library. Personally, I think it’s perfectly reasonable to tell the user in this case to either remove the
static keyword, or override the visibility for the offending variable. Supporting this means changing the normally understood semantics of these language features, which is something I was hoping to avoid as I started moving offloading semantics closer to the host’s. The OpenMP standard doesn’t explicitly allow or disallow this so it’s open to interpretation, but at the end of the day it’s the users that really dictate this stuff. There is precedent to do this from other languages and compilers. CUDA and the NVHPC compilers will do some sort of name mangling for this.
There are a couple problems with simply mangling the names though, and it’s probably impossible to get it truly unique as I’ve discovered through conversations with others.
- If we generate a true random number, it would need to be used for both the host and device. This means each compilation job has some external dependency so we can’t run the host and device compilations independently. This also punts the problem to statistically impossible instead of actually impossible
- If we use the file’s unique ID, then we could uniquely identify it on that filesystem, however this won’t work if the user compiles the same file multiple times.
- If we use the file’s unique ID and add in some extra values from the compilation, this still would fail in the case of a non-static source tree like in the case of distributed compilation (e.g.
mv /a/foo.c host1:/tmp/foo.c).
These are pretty extreme edge cases, so all the solutions I’ve seen from other compilers simply use the filename or path and ignore all the possible failure modes. Personally I’m not a fan of this sort of “good enough” implementation when the alternative is just not supporting it, but it should solve the majority of cases.
Clang’s CUDA support has some facilities to do this that we could inherit to support this. I would be begrudgingly fine implementing this if we needed to. Primarily because we can see all the uses within the TU and only do this name mangling if there’s any
update clauses attached to it. That way we get to still use static variables on the device when creating device libraries.
However, I am staunchly against doing this for
hidden visibility. If we decay
hidden visibility to
default it would prevent us from being able to use any
hidden variables (without making an exception for
nohost regions). This is because in the following case, we can’t know if someone outside the TU will try to update that variable form the host. The hidden visibility means that if we’re targeting AMDGPU, which uses shared libraries for their images, we won’t be able to see the symbol to update it.
#pragma omp declare target
int __attribute__((visibility("hidden"))) aaa=1;
#pragma omp end declare target
Even though the NVIDIA toolchain already demotes
hidden visibility to
default visibility, we respect it completely in LLVM-IR and use it when performing LTO. If a variable has
hidden visibility, that means we can eliminate it entirely from the module because there will be no external readers. We use
hidden visibility in the OpenMP device runtime for this reason. Otherwise, all the
default globals would need to be exported when performing LTO, which is not ideal.
For the TL;DR, I think that externalizing static variables is a bad idea that shouldn’t have been introduced. However, there is precedent with other compilers and languages doing this and users may need it, so we might need to support it anyway. Name mangling these is non-trivial, but in the static case I’m okay with it only because we can ignore it unless the user actually updates it. I’m against making
hidden variables visible because it prevents us from optimizing some cases.
Let me know what your thoughts are on this, I may have missed some things.