llvm-libc for CUDA

Hello Hal,

You had asked me a question about nvtpx on
https://reviews.llvm.org/D67867. I did some homework on that and below
is what I have learnt.

For CUDA/nvptx, a libc in general might be irrelevant. However, I
learned from Art (copied in this email) that there is a desire to have
a single library of math functions that clang can rely on for the GPU.
So, even if a libc in general could be irrelevant, a subset of the
libc might indeed become relevant for GPUs.

We want llvm-libc to expose a thin layer of C symbols over the
underlying C++ implementation library. My patch
(https://reviews.llvm.org/rL373764) showcased one way of doing this
for ELF using the section attribute followed by a post-processing
step. We might have to take a different approach for nvptx because ELF
like sections and tooling might not be feasible/available (as there is
no linking phase during GPU-side comiplation for NVIDIA GPUs). Art
explained to me that device code undergoes whole program analysis by
LLVM. Hence, we can provide an explicit C wrapper layer over the C++
implementation library. If source level wrappers are not desirable, we
can consider using IR level aliases (will we have to deal with mangled
names??). This gives the benefit that, while it looks like a normal C
function call from the user's point of view, the whole program
analysis performed by LLVM will eliminate the additional wrapper call
preventing any performance hits.

Thanks,
Siva Chandra

Hello Hal,

You had asked me a question about nvtpx on
https://reviews.llvm.org/D67867. I did some homework on that and below
is what I have learnt.

For CUDA/nvptx, a libc in general might be irrelevant. However, I
learned from Art (copied in this email) that there is a desire to have
a single library of math functions that clang can rely on for the GPU.
So, even if a libc in general could be irrelevant, a subset of the
libc might indeed become relevant for GPUs.

We want llvm-libc to expose a thin layer of C symbols over the
underlying C++ implementation library. My patch
(https://reviews.llvm.org/rL373764) showcased one way of doing this
for ELF using the section attribute followed by a post-processing
step. We might have to take a different approach for nvptx because ELF
like sections and tooling might not be feasible/available (as there is
no linking phase during GPU-side comiplation for NVIDIA GPUs). Art
explained to me that device code undergoes whole program analysis by
LLVM. Hence, we can provide an explicit C wrapper layer over the C++
implementation library. If source level wrappers are not desirable, we
can consider using IR level aliases (will we have to deal with mangled
names??). This gives the benefit that, while it looks like a normal C
function call from the user’s point of view, the whole program
analysis performed by LLVM will eliminate the additional wrapper call
preventing any performance hits.

We’re currently using the wrappers in Clang headers, so this proposal should not make things worse.

The #1 on my wish list for the standard library is to have libm available to clang/llvm as bitcode library, which would make it possible to re-enable lowering to various library calls in LLVM when we target NVPTX and, possibly, avoid rather precarious dependency on the binary libdevice bitcode blob which comes with CUDA SDK.

AMDGPU folks are also using bitcode libraries, so providing standard math library as bitcode may benefit them, too.

–Artem

Hello Hal,

You had asked me a question about nvtpx on
https://reviews.llvm.org/D67867. I did some homework on that and below
is what I have learnt.

For CUDA/nvptx, a libc in general might be irrelevant. However, I
learned from Art (copied in this email) that there is a desire to have
a single library of math functions that clang can rely on for the GPU.
So, even if a libc in general could be irrelevant, a subset of the
libc might indeed become relevant for GPUs.

We want llvm-libc to expose a thin layer of C symbols over the
underlying C++ implementation library. My patch
(https://reviews.llvm.org/rL373764) showcased one way of doing this
for ELF using the section attribute followed by a post-processing
step. We might have to take a different approach for nvptx because ELF
like sections and tooling might not be feasible/available (as there is
no linking phase during GPU-side comiplation for NVIDIA GPUs). Art
explained to me that device code undergoes whole program analysis by
LLVM. Hence, we can provide an explicit C wrapper layer over the C++
implementation library. If source level wrappers are not desirable, we
can consider using IR level aliases (will we have to deal with mangled
names??). This gives the benefit that, while it looks like a normal C
function call from the user's point of view, the whole program
analysis performed by LLVM will eliminate the additional wrapper call
preventing any performance hits.

We're currently using the wrappers in Clang headers<https://github.com/llvm-mirror/clang/blob/master/lib/Headers/__clang_cuda_device_functions.h&gt;, so this proposal should not make things worse.

The #1 on my wish list for the standard library is to have libm available to clang/llvm as bitcode library, which would make it possible to re-enable lowering to various library calls in LLVM when we target NVPTX and, possibly, avoid rather precarious dependency on the binary libdevice bitcode blob which comes with CUDA SDK.

AMDGPU folks are also using bitcode libraries<https://github.com/RadeonOpenCompute/ROCm-Device-Libs/tree/master/ocml/src&gt;, so providing standard math library as bitcode may benefit them, too.

Thanks, Siva, Art. +1 to this. Across GPUs from several vendors, and other such platforms, I think that this will be very valuable. It's also not just math functions, although the math functions are likely the performance-sensitive cases, but there are a lot of libc functions that we would like to have available to ease transitioning code to accelerators. For example, snprintf, qsort.

-Hal

--Artem

Thanks,
Siva Chandra