CUDA again: what is supported (and what is not)

Hi all,

I'm trying to take stock of the situation about clang's CUDA support.

The informations I found around the web are very fragmentary and sometimes
contradictory, so I decided to open this post to clarify, and perhaps help
others in the same situation.

*Question: is there some way to generate ASTs without touching to the
front-end?*

These are more or less the informations I found.

1) Looking at the official repositories (this mirror
<https://github.com/llvm-mirror/clang&gt; for example) it seems that some
work is in progress, but of course far away from completion.

2) After a brief chat on the official IRC Channel
<http://irc.lc/oftc/clang/irctc@@@&gt; , some users confirmed me that
end-to-end compilation isn't supported, but that they internally have some
parser that runs quite well (?). They also linked me the D9506
<http://reviews.llvm.org/D9506&gt; , D9507 <http://reviews.llvm.org/D9507&gt;
and D9509 <http://reviews.llvm.org/D9509&gt; patches.

3) Some projects claim that they can parse CUDA with clang. I'm referring in
particular to CU2CL <http://chrec.cs.vt.edu/cu2cl/&gt; (cited in this
discussion
<http://clang-developers.42468.n3.nabble.com/Parsing-CUDA-file-to-AST-td4038287.html&gt;
too).

4) It is not clear to me if using some other tools, like libnvvm, is
possible to at least generate LLVM IR from CUDA source (and then maybe
compile it on x86?). (Or maybe only from PTX? Is that the so called NVVM
IR?)

Any answer about any possible solution will be very appreciated!

Thanks in advance,
Luca

Hi,

Hi all,

I’m trying to take stock of the situation about clang’s CUDA support.

TL;DR version: you can use clang -cc1 to compile some CUDA code that does not use Nvidia’s CUDA headers.

The informations I found around the web are very fragmentary and sometimes
contradictory, so I decided to open this post to clarify, and perhaps help
others in the same situation.

Question: is there some way to generate ASTs without touching to the
front-end?

Cuda code that does not use CUDA headers should be compilable. Device-side will compile all the way down to PTX. Host side can generate appropriate glue to initialize and launch kernels. So, the answer is a qualified “yes”.

Here’s a trivial example of device-side compilation. Add -ast-dump if you want to see AST.

echo ‘attribute((global)) void kernel(void) { }’ | clang -cc1 -x cuda -fcuda-is-device -triple nvptx64-unknown-cuda -S -

Driver does not know much about cuda yet and that’s something D9509 is intended to help with. For now, though, you’d have to do host and device compilation manually with cc1.

These are more or less the informations I found.

  1. Looking at the official repositories (this mirror
    <https://github.com/llvm-mirror/clang> for example) it seems that some
    work is in progress, but of course far away from completion.

One can hope it’s not that far from the point where it’s usable. I’ve been digging in that direction and I’m getting the glimpse of a light at the end of the tunnel. I have rough set of changes that can compile and successfully run some of examples that come with CUDA 7.0.

  1. After a brief chat on the official IRC Channel
    <http://irc.lc/oftc/clang/irctc@@@> , some users confirmed me that
    end-to-end compilation isn’t supported, but that they internally have some
    parser that runs quite well (?). They also linked me the D9506
    <http://reviews.llvm.org/D9506> , D9507 <http://reviews.llvm.org/D9507>
    and D9509 <http://reviews.llvm.org/D9509> patches.

Yup. End-to-end compilation is not here yet. D9509 will get driver to handle CUDA compilation pipeline, but there are other missing pieces.

  1. Some projects claim that they can parse CUDA with clang. I’m referring in
    particular to CU2CL <http://chrec.cs.vt.edu/cu2cl/> (cited in this
    discussion
    <http://clang-developers.42468.n3.nabble.com/Parsing-CUDA-file-to-AST-td4038287.html>
    too).

Syntax-wise CUDA is pretty much C++ with triple-brackets kernel launch. The rest boils down to few attributes and builtin variables that can be implemented/faked in an include file, so parsing bare-bones CUDA source file is not particularly challenging. So, yes, it is doable.

  1. It is not clear to me if using some other tools, like libnvvm, is
    possible to at least generate LLVM IR from CUDA source (and then maybe
    compile it on x86?). (Or maybe only from PTX? Is that the so called NVVM
    IR?)

If I understand it correctly, libnvvm provides GPU-specific optimizations on IR level. I.e. front-end (clang) would generate IR, libnvvm would optimize it, and then back-end (llvm) would generate PTX. As far as I can tell, it never sees CUDA source and thus can’t help you.

Any answer about any possible solution will be very appreciated!

Bottom line is that if you can live without CUDA headers, clang is somewhat usable right now.

–Artem