[RFC] GPU builtins runtime

I wish to introduce a new subdirectory under compiler-rt containing an abstraction over GPU offloading. It could be at the top level instead if people prefer.

The purpose is to provide a compiler-rt builtins style library that can be used as an abstraction over GPU specific intrinsics (see closely related Proposing llvm.gpu intrinsics) and over the various vendor libraries for interacting with GPUs, such as cuda, hsa, opencl.

We have at least two in-tree projects that have a large amount of complicated code doing useful things with a more-or-less tidy internal module that abstracts over things like the name of a vendor memory allocator; libc and the openmp runtime. I believe MLIR has something similar for ROCm vs cuda.

I think this is fundamentally uninteresting code. All GPU vendors have means of finding a kernel by name, I can’t remember the details of any of them offhand. It should be written once more, in a single place which is adequately discoverable, so that all the projects that want to run code on GPUs and aren’t particularly interested in exactly what the vendor API looks like get to go through the same path.

This will factor GPU-specific code out of libc. It would make the plugins of libomptarget look very similar to one another. It’ll simplify the ad hoc GPU applications I write to test parts of our compiler infra.

This is completely consistent with @jdoerfert’s [RFC] Introducing `llvm-project/offload` - I want to separate the vendor API quirks from the language implementation requirements. It would be constructive for this proposal to be viewed as a sane refactoring of existing code for easier reuse and greater testing, not as an alternative to that project. In particular, I’m interested in the GPU builtins runtime being sufficiently self contained that building it without the rest of LLVM for use in ad hoc out of tree projects is feasible.