OpenMP defines __CUDA_ARCH__ for offloading, should it?

Discovered by accident while looking into a bug for Ron (cc’d).

OpenMP running on nvptx defines the CUDA_ARCH macro. Do we think it should? OpenMP target offloading is somewhat implemented in terms of cuda but that seems incidental.

I’d like a GPU_ARCH macro which expands to something useful for nvptx, amdgcn, other. And to not define CUDA_ARCH when compiling openmp offloading code.

Thoughts?

Jon

I'm not convinced. Even after we move the `cuda_wrapper` headers into a `gpu_wrapper`

folder and make them generic, it is unclear to me that this will work better. We'll end

up with these:

`__GPU_ARCH__ > 70 && __IS_AMDGCN__`

I would suggest we keep __CUDA_ARCH__ and introduce __AMDGCN_ARCH__ as needed.

TBH, I also haven't understood what the problem actually is.

Discovered by accident while looking into a bug for Ron (cc'd).

OpenMP running on nvptx defines the __CUDA_ARCH__ macro. Do we think it should? OpenMP target offloading is somewhat implemented in terms of cuda but that seems incidental.

Shouldn't it do this only when compiling device code for NVIDIA architectures?

-Hal

Discovered by accident while looking into a bug for Ron (cc'd).

OpenMP running on nvptx defines the __CUDA_ARCH__ macro. Do we think it should? OpenMP target offloading is somewhat implemented in terms of cuda but that seems incidental.

Shouldn't it do this only when compiling device code for NVIDIA architectures?

Yes, I implicitly assumed that given that we only compile for NVIDIA right now.

FWIW, I don't assume __CUDA_ARCH__ to be present for non-NVIDIA targets.

Thanks guys

Defining CUDA_ARCH on non-nvptx architectures is definitely bad. I think Ron has a test case that used CUDA_ARCH to test for running on nvptx. I’m trying to stop amdgcn defining it for openmp at present.

I would suggest we keep CUDA_ARCH and introduce AMDGCN_ARCH as needed.

That sounds better, with the proviso that nvptx shouldn’t define AMDGCN_ARCH and amdgcn shouldn’t define CUDA_ARCH.

We kind of conflate cuda and nvptx. Perhaps the macro should be NVTPX_ARCH instead of CUDA_ARCH?

There may be no problem here. Defining CUDA macros for openmp seemed weird, but I’ve since learned that cuda intrinsics work within openmp target regions for nvptx and that seems useful.

Thanks,

Jon