PTX generation examples?

I have an app that uses LLVM API calls from C++ to generate IR and JIT it for x86 (for subsequent live execution). I'm still using the old JIT, for what it's worth.

I want to modify it (for prototype/experimental purposes for now) to JIT PTX (into a big string buffer?).

Docs are sketchy. I can wade through it and figure it out by trial and error, but would be so very happy if somebody could point me to code or docs addressing all the issues I'll face: how to find out at runtime if the libLLVM I'm linked against can generate PTX, how to change the initialization or JIT commands to request PTX rather than x86, anything I need to know about differences in the IR I should present, etc.

Any pointers would be greatly appreciated, thanks.

You'll have to switch to MCJIT for this purpose. Legacy JIT doesn't emit


OK, fine – an example of MCJIT that sets up for PTX JIT would also be helpful.

There is no MCJIT support for PTX at the moment (mainly because PTX does not have a binary format, and is not machine code per se).

To generate PTX at run-time, you just set up a standard codegen pass manager like you would like an off-line compiler. The output will be a string buffer that contains the PTX, which you can load into the CUDA runtime.

As for determining if PTX support is compiled into the LLVM binary you are using, you could register all targets and then check if you can create a Target for the “nvptx” or “nvptx64” triple:


std::string Err;

const Target *Tgt = TargetRegistry::lookupTarget(“nvptx64”, Err);

if (Tgt) {

// nvptx target is available

} else {

// nvptx target is not available

More information about the PTX target can be found at:

Ah, that’s helpful. I knew that I’d need to end up with PTX as text, not a true binary, but I would have figured that it would come out of MCJIT. Thanks for helping to steer me away from the wrong trail.

OK, one more question: Can anybody clarify the pros and cons of generating the PTX through the standard LLVM distro, versus using the “libnvvm” that comes with the Cuda SDK?

– lg

The NVPTX target in upstream LLVM is basically the same NVPTX target from
libNVVM, ported to upstream LLVM with a couple of proprietary features

One thing to consider is that libNVVM is based on LLVM 3.0 and only
IR-compatible up to LLVM 3.2. So if you use the LLVM 3.3, 3.4, or trunk
libraries to generate IR, it will not be compatible with libNVVM due to
differences in the IR and bitcode formats. Even dumping the IR to text
first will not work because of the new attributes syntax.

With libNVVM, you get the same compiler middle-end and back-end as the
shipped nvcc compiler, but you're limited to LLVM 3.0-3.2. With upstream
LLVM, the target handles new IR features and can take advantage of
improvements to the LLVM core optimizers. If you're already invested in
LLVM 3.3+, I would recommend sticking with it and using the upstream NVPTX
target. The libdevice math library that ships with the CUDA toolkit is
fully compatible with upstream LLVM, not just libNVVM.

FYI: I'm the maintainer of the NVPTX target in LLVM. Feel free to contact
me with any questions you have.