Unifying clang flags/language opts for offloading languages

jdoerfert · December 15, 2023, 6:49pm

I’m looking at the clang frontend, especially to put some POC patches up that show how we can do fun things with LLVM/offload. I want nice and clean patches but there is an issue. Often, the offloading languages (CUDA, OpenMP, HIP*, SYCL, …) seem to simply copy flags and language options from the others and then rename them. That’s very unfortunate for interoperability and code sharing in Clang.
Let’s look at an example:

  def fcuda_is_device : Flag<["-"], "fcuda-is-device">

  def fopenmp_is_target_device : Flag<["-"], "fopenmp-is-target-device">
  def : Flag<["-"], "fopenmp-is-device">, Alias< fopenmp_is_target_device >;

  def fsycl_is_device : Flag<["-"], "fsycl-is-device">,

and

  LANGOPT(CUDAIsDevice      , 1, 0, "compiling for CUDA device")
  LANGOPT(OpenMPIsTargetDevice    , 1, 0, "Generate code only for OpenMP target device")
  LANGOPT(SYCLIsDevice      , 1, 0, "Generate code for SYCL device")

So, 1 concept, 4 flags, 3 language options.

Do we really need this? I understand the flags might now be (semi) permanent but can we eliminate the language options at least?

Of course the solution to too many is another one:

def offload_is_device : Flag<["-"], "offload-is-device">,
LANGOPT(OffloadIsDevice, 1, 0, "generate code only for an offload device.")

with all options aliasing this one and all LangOpts uses replaced by OffloadIsDevice.

There are more such opportunities but I wanted to see what people think before I start making patches.
Maybe I overlooked an intrinsic reason we need the complexity.

~ J

* HIP reuses a lot of CUDA flags, which is weird but somehow better I guess.

tahonermann · December 15, 2023, 7:40pm

Thank you for raising this. I was just this week discussing similar concerns with colleagues. In my case, we were discussing the various options for specifying offload targets. I would also like to see some convergence in option names.

In the recent SYCL Driver enhancements RFC, we proposed adding the following options for SYCL:

-fsycl-targets=

The corresponding options for other offloading languages are spelled:

-fopenmp-targets=
--fopenmp-target-jit
--cuda-gpu-arch=
--offload-arch=

--cuda-gpu-arch is an alias of --offload-arch=.

The help text for --offload-arch= is:

HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
         "If 'native' is used the compiler will detect locally installed architectures. "
         "For HIP offloading, the device architecture can be followed by target ID features "
         "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;

I think a unification makes sense, though there might still be a need or desire for target architectures to allow for language specific architecture names and allowances. Perhaps it would make sense to introduce a new tblgen .td file to describe target architectures with a canonical name and language dependent name mappings and allowances.

jdoerfert · December 15, 2023, 8:16pm

My thought was to make them all aliases for the --offload-* versions. And maybe, just maybe, start to deprecate them. That said, I don’t think I understand your tablegen proposal. How would that (conceptually) look like for targets or is-device?

FWIW, --fopenmp-target-jit is not like the others, it doesn’t specify a target, just that you should keep IR around to be jitted at runtime.

tahonermann · December 15, 2023, 10:16pm

Sounds good to me.

The idea (not a proposal at this point) is that the set of recognized offload targets could be specified with something like the following (with additional names as appropriate for other offload languages).

class OffloadLanguage {}
def OpenMP : OffloadLanguage;
def CUDA : OffloadLanguage;
def HIP : OffloadLanguage;
def SYCL : OffloadLanguage;

class OffloadTargetName<OffloadLanguage Language, string Name> {}

class OffloadTarget<list<OffloadTargetName> Names> {}

def NVTuringPTX : OffloadTarget<[
  OffloadTargetName<CUDA, "compute_75">]>;
def NVTuring : OffloadTarget<[
  OffloadTargetName<CUDA, "sm_75">]>;
def AMDGCN : OffloadTarget<[
  OffloadTargetName<HIP, "compute_amdgcn">]>;
def AMDGFX900 : OffloadTarget<[
  OffloadTargetName<HIP, "gfx900">]>;

Yes, in that respect it is similar to a generic PTX or SPIR-V target that is JIT compiled at run-time.

jdoerfert · December 15, 2023, 10:19pm

What I want us to avoid is the strict coupling between targets and languages.
There is no reason for it and it’s actually problematic for some fun stuff we might want to do (see Showcasing LLVM/Offload as an example).

tahonermann · December 15, 2023, 10:25pm

I agree, but backward compatibility will require preserving the names that are in use now. I would expect the tblgen example I provided to be adjusted to express a common name for all offload languages with aliases provided where needed for compatibility. My tblgen skills are limited; I’m sure someone else can better figure out how to express this than I can.

jdoerfert · December 15, 2023, 10:27pm

I’m happy w/ using tablegen or just manually doing the aliases. I guess the former will make it clearer in the future that we need a “generic option” which can have language specific aliases.
We should make this an issue on the project board for LLVM/Offload

AaronBallman · December 16, 2023, 2:02pm

Do people mix and match these technologies? My understanding is that these flags are orthogonal because it’s feasible someone may want to use OpenMP and SYCL (for example) within the same program.

tahonermann · December 16, 2023, 5:51pm

I think there are (at least) three perspectives on this question:

Compilation of a single TU that uses multiple offloading languages. I don’t know if this is something that people do.
Linking TUs into a single program where subsets of the TUs use different offloading languages. I would expect this to happen.
Invoking the compiler driver for multiple TUs where subsets of the TUs use different offloading languages but where they might or might not be combined into a single program. I don’t know if this is something that people do.

I don’t know to what extent Clang supports the above. If these are scenarios that are intended to be supported, then it seems distinct options for each offloading language are potentially beneficial. On the other hand, it might not be beneficial to, for example, allow distinct subsets of offloading targets to be specified for OpenMP vs SYCL for a single TU.

jdoerfert · December 18, 2023, 9:27pm

Was never supported by Clang and it’s a hard sell, even if it was requested by users in the past. I would, for now, consider this unsupported.
This is happening in practice and supported fine with the new offload driver.
I’m not sure if this happens and if this is causing problems we can reasonably disallow it. I have never heard this to be a problem or request.

Some flags mean the same thing and they are not mixed (see above). -fXXX-is-device is a good example. We have it 3 times and we use it for exactly the same thing. The only difference is that the different toolchains will use a different version. We already unified --offload-arch and --offload (which isn’t properly hooked up but exists), which shows that we can unify flags and lang-opts. Obviously we have to look at each unification in detail but even if we assume we mix languages arbitrarily, I very much doubt the same TU can be a sycl device compilation and a non-sycl non-device compilation. Either we target “a device” or we do not, IMHO.

AaronBallman · January 10, 2024, 2:45pm

I guess this is a bit less obvious to me. I can understand not wanting to mix devices in the same path (e.g., SYCL device code trying to offload to OpenMP device code) but it seems reasonable to me to mix devices in the same TU but not on the same path (e.g., SYCL device code performing one calculation while OpenMP device code performs another – the primary use case I think I see for this is when someone wants to dynamically transition between offloading technologies but still wanting to share internal helper code). It’s still targeting “a device” in that case, but it’s within a single TU.

Then again… expecting the user to split the TUs doesn’t seem like it’s imposing a huge hardship, so maybe that’s reasonable? They would need to expose static helper functions across TU boundaries in that case, which is unfortunate, but maybe okay.

jdoerfert · January 10, 2024, 3:33pm

I said we have to look at every unification in detail. Said differently: We obviously can’t just unify it all without thinking about it. I don’t think this is a point of contention

We do not support: 1 TU, 2 offloading models.
We allow 2 TUs, 2 offloading models, calling each other.
We generally allow 1 TU, 1 offloading model, and host OpenMP.

Now we can argue the first case might be useful, but then “is_device” is still not needed to exist 3 times. Not to say I ever heard someone planning to support case 1. With LLVM/Offload we might be able to, but even then we need to compile the TU once per offload model. SYCL and OpenMP and HIP and CUDA have different rules and include paths, etc. we can’t expect one to work with the driver mode of the other. Anyway, with or without support, one “is_device” flag is sufficient since we can check the language mode. The only reason you need 3 is if you would expect to mix “is_sycl_device” with CUDA or HIP. Note that “isdevice” here just means we compile not for the host but the offload target. My point is that “is_sycl_device” = “is_device” + SYCL, “is_cuda_device” = “is_device” + CUDA, etc. qualifies everything uniquely.
Does that help?

AaronBallman · January 10, 2024, 3:55pm

Heh, I don’t contend it.

Okay, good to know!

Agreed for our current support matrix, I was wondering more whether we were expecting to widen that support to allow 1 TU 2 offloading models. The situation I was thinking about is for folks transitioning from one technology to another and whether they can do:

if (I_Want_SYCL) {
  CallSYCLKernelToDoTheWork();
} else if (I_Want_CUDA) {
  CallCUDAKernelToDoTheWork();
} else ...

where the code all lives within one TU and can dynamically pick which offloading to use at runtime. I think I understand the answer to be: users can accomplish that by putting the SYCL offloading into one TU, the CUDA offloading into another TU, but they cannot both live in the same TU. If that’s a reasonably correct understanding, then I think that’s fine.

Yup, I think understand better now. Thank you!

jdoerfert · January 10, 2024, 4:02pm

It is. We are

Topic		Replies	Views
Clang option to provide list of target-subarchs. LLVM Dev List Archives	4	80	February 7, 2017
[RFC] Use the 'new' offloding driver for CUDA and HIP compilation by default Clang Frontend cuda , hip , gpu	27	774	January 10, 2025
[RFC][OpenMP][CUDA] Unified Offloading Support in Clang Driver Clang Frontend	50	163	March 21, 2016
[RFC] Unified offloading option for CUDA/HIP/OpenMP Clang Frontend	25	219	March 10, 2021
OpenMP offload and CUDA in the same translation unit? OpenMP	5	331	May 23, 2023

Unifying clang flags/language opts for offloading languages

Related topics