Compiling to NVPTX

I’m in the process of writing a library and giving a talk about writing compilers using LLVM (llvm-c) and Clojure. As part of my talk I’d like to give an example of a program running on CUDA.

Are there any papers, tutorials, examples, on writing a custom frontend for NVPTX? For instance, I’m trying to figure out how to get access to “global” variables like blockidx. I know that libc won’t be accessible so I’m probably just going to give a demonstration of a image blur filter written in a custom-built programming language

I was going to try simply taking my llvm module and having llvm write out the object file using the nvptx triple. But to be honest here, I’m just shooting in the dark at this point.

Any information would be greatly appreciated.


You caught me right as I’m leaving the office, so this reply will be a bit brief until I have time to write something more lengthy…

Generally, the only conventions you need to follow for simple CUDA kernels is to use ptx_kernel as the calling convention, and use address space 1 for global-address pointers. The generated PTX is consumable by the Driver API (see CUDA samples for using the Driver API for loading/executing PTX kernels).

In the near future, I hope to put together a bit of a tutorial on this. I am well aware that there is a lack of information here, and I hope to correct that soon!