This RFC is in reference to generating the headers for the GPU port of the LLVM C Library, see libc for GPUs — The LLVM C Library.
Background
We currently compile the libc
source for the GPU as if it were standard C++ without using an existing offloading support, for example like the following code.
clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx90a -c -fvisibility=hidden -nogpulib fputs.cpp
The GPU is a completely freestanding environment and has no existing C library, including the system libraries when doing a freestanding compilation like the above example will cause issues as unsupported definitions are pulled in. For this reason we generate our own headers using the libc
interface with the libc-hdrgen
tool. Here is what that GPU port will emit currently for the stdio.h
header.
#include <__llvm-libc-common.h>
#include <llvm-libc-macros/file-seek-macros.h>
#include <llvm-libc-macros/stdio-macros.h>
#define EOF -1
#include <llvm-libc-types/FILE.h>
#include <llvm-libc-types/size_t.h>
__BEGIN_C_DECLS
int puts(const char *__restrict) __NOEXCEPT;
int fputs(const char *__restrict, FILE *__restrict) __NOEXCEPT;
extern FILE * stdout;
extern FILE * stderr;
__END_C_DECLS
This works very well for the freestanding target the libc
uses for its internal testing. However, most all users of this library will do so through an existing GPU offloading language such as OpenMP, CUDA, or HIP. These targets generally work by splitting a single source file into separate compilations, typically one for the CPU (host) and one for the GPU (device). For these targets to work we need to obey the following restrictions:
-
The same headers need to be included from both the host and device
This is necessary because the compilation via an offloading language is single source. In general, we need both sides to agree on the values of macros, constants, etc or else we will get strange divergent behavior. -
Objects present on the GPU must be marked on the GPU
This must be done on both the host and device sides for the offloading language. For example, if we declare thatstderr
is on the GPU only for the device compilation but not the host and try to use it, we will miscompile because the host and device sides do not agree on what is present on the GPU. -
Types cannot conflict with the system headers
The LLVM C Library is not a full replacement for the user’s system headers, so the host will need to include their own headers. This is problematic because we would then doubly declare types if combined with the LLVM C headers.
Currently, these offloading languages use existing wrappers around the system headers that can be found at llvm-project/clang/lib/Headers at main · llvm/llvm-project · GitHub. For offloading languages we do not have the same issues with picking up the system headers because we can eagerly cull things that are not actually on the GPU, while when doing a freestanding compilation we must include everything. Also, the offloading languages will pass definitions for the auxiliary triple (e.g. X86_64) which bypasses a lot of the failure modes.
Proposal
The proposal here is to provide a singer header that is compatible with both the freestanding GPU target and when included from an existing offloading language. This will allow us to precisely control the libc
implementation when being used internally, and provide compatible headers when included from one of these existing languages. Taking the header example from above, we can transform it to provide the necessary utility depending on the compilation.
#if !defined(_OPENMP) && !defined(__CUDA__) && !defined(__HIP__)
#include <__llvm-libc-common.h>
#include <llvm-libc-macros/file-seek-macros.h>
#include <llvm-libc-macros/stdio-macros.h>
#define EOF -1
#include <llvm-libc-types/FILE.h>
#include <llvm-libc-types/size_t.h>
#else
#include_next<stdio.h>
#endif
#include <llvm-libc-macros/gpu-macros.h>
__BEGIN_C_DECLS
__BEGIN_OPENMP_DECLS
int puts(const char *__restrict) __NOEXCEPT __DEVICE;
int fputs(const char *__restrict, FILE *__restrict) __NOEXCEPT __DEVICE;
extern FILE * stdout __DEVICE;
extern FILE * stderr __DEVICE;
__END_OPENMP_DECLS
__END_C_DECLS
Where the additional header <llvm-libc-macros/gpu-macros.h>
could look like the following,
#if defined(_OPENMP)
#define __BEGIN_OPENMP_DECLS _Pragma(omp target begin declare target)
#define __END_OPENMP_DECLS _Pragma(omp target end declare target)
#else
#define __BEGIN_OPENMP_DECLS
#define __END_OPENMP_DECLS
#endif
#if defined(__CUDA__) || defined(__HIP__)
#define __DEVICE __attribute__((device))
#else
#define __DEVICE
#endif
#if defined(_OPENMP) || defined(__CUDA__) || defined(__HIP__)
#undef __NOEXCEPT
#define __NOEXCEPT
#endif
This will allow us to use the headers as-is with the LLVM C library defined types when we are compiling with an offloading toolchain like we do currently. However, if we are compiling for OpenMP, CUDA, or HIP we will instead use #include_next
to get the next header in the search path and obtain the system’s stdio.h
which will provide the types instead. We will then use the generated entrypoints to precisely define which system utilities are available on the GPU.
This will allow us to then install these headers to the current include/gpu-none-llvm/
are prepend that search path when targeting the GPU. Long-term this will allow us to remove the wrapper headers in clang
. We propose that this is done in a single header rather than a different platform for ease of use. If we were to generate separate headers we would then need to run libc-hdrgen
multiple times in a single build and disambiguate on where they were installed.
What’s required
This will require some additions to the libc-hdrgen
tool. Most likely we will need an extra flag if we are operation in GPU mode to perform the wrapping. Because we cannot include any of the LLVM headers we will also need to change the interface. The proposed method is to take the existing headers and convert them to something like this.
#ifndef LLVM_LIBC_STDIO_H
#define LLVM_LIBC_STDIO_H
%%include(__llvm-libc-common.h)
%%include(llvm-libc-macros/file-seek-macros.h)
%%include(llvm-libc-macros/stdio-macros.h)
%%public_api()
#endif // LLVM_LIBC_STDIO_H
This would allow us to then cause the normal targets to emit the #include
as normal, while the GPU target could defer that until it wraps it in the GPU check.
Caveats
This will require some fine-tuning to make work in general. This should be able to be handled in the platform file definitions in the general case. For example, the GNU libc
provides isalnum
as a macro.That means in order to use our implementation in the libc
library we need to #undef isalnum
on the GPU. I believe these can be handled on a case-by-case basis.
Feedback would be appreciated, this is required to actually ship the GPU libc as a product and would allow us to simply generate out own headers for the GPU instead of using the clang
wrappers so I am eager to see this through.