Emitting IR in older formats (for NVVM)

This question is specifically motivated by the practical constraints of NVVM, but I don’t know anywhere better to ask (hopefully, e.g., @jholewinski is still following), and I believe it concerns general LLVM issues:

NVIDIA’s libNVVM is built on LLVM 3.2. This means its bitcode and LL text parsers are from that generation. It’s interface calls for adding modules as either bitcode blobs or LL text buffers. LLVM’s bitcode and assembly formats have never been intended to maintain strong cross-version compatibility. However, this means that a compiler built on more recent LLVMs struggles mightily to emit even simple IR for NVVM to compile.

Specifically, I find 3.4 (the official Ubuntu package) generates both bitcode streams and LL text which are incompatible with the 3.2 parser in the current NVVM release (6.5). I’m hardly surprised that the binary format changes, but simple examples generally manage fine in LL text. Nontrivial modules, however, do not. Specifically, the first issue I notice is that #i attribute references (as opposed to inline attributes) appear to be a recent addition to the assembly syntax; they are always used by the assembly serializer (when, e.g., streaming out a Module) for any function attributes in generated assembly, but cannot be parsed by the NVVM/3.2 parser. This seems to be the main compatibility issue, but it’s hard to tell without first eliminating it and then proceeding further.

So that’s the challenge.

The obvious questions are:

  1. Narrowly, is it possible to coerce the standard LLVM bitcode or text writers to emit more conservative/backwards-compatible output (specifically with an eye towards NVVM/3.2)? Am I just going to have to resort to brittle string rewrites on the generated text to inline all #attr values?

  2. More generally, is there another accepted way to create NVVM programs from LLVM-based compilers which use versions more recent than 3.2? I can’t imagine I’m the first one to run into this—3.2 is fairly old at this point, and NVVM seems wedded to its interface for stability reasons.

Many thanks.
-jrk

Hi Jonathan,

I am using NVVM in my PhD These I presented at the LLVM-HPC workshop at SC14 called PACXX. I encountered the same problem about a year ago. I’m using clang 3.5 to generate IR from C++14 code an had to get rid of the attributes as you pointed out. I have a little opt pass that runs after code generation and as one of its task it removes all attributes. However, it seems that libNVVM does not care about attributes at all. Only a few attributes are supported (see 3.11 from the NVVM IR specifications) basically controlling inlining, But I can’t look inside libNVVM so it’s a wild guess and clarification would be nice how far function attributes influence PTX generation.

Regarding your questions:

  1. If there is a way to create IR that is backward compatible with 3.2, then I realy want to hear about it. However, I doubt there is one. When I started using NVVM I read somewhere that backward compatibility is never a property targeted for LLVM IR.

Cheers,
Michael Haidl

Dear Jonathan,

the following link may help:

https://github.com/KhronosGroup/SPIR-Tools/tree/master/spir-encoder/llvm_3.5_spir_encoder

From the documentation:

spir-encoder

Since SPIR can be (easily) transformed to NVVM IR at least for me this helps a lot.

Thank you Tobias.

-MH

Yes, this is an issue we are keenly aware of. LLVM IR is not a stable storage format, and NVVM is therefore tied to a particular revision of the IR. You may want to give the 7.0 RC a try; this updates libnvvm to LLVM 3.4. There may still be incompatibilities with 3.5 or 3.6svn, but it should be better.

The SPIR converter looks promising. Since it doesn’t actually interpret the IR, things like address space mappings should transfer fine.

Another possibility is to skip libnvvm altogether and use LLVM’s NVPTX target. This is of course harder since you have to configure the passes yourself instead of just calling a few C functions, but it does give you more control over the optimization pipeline and gives you full visibility into the compiler. Unfortunately, there are some NVVM-specific optimizations missing upstream that we are not able to contribute back.

Thanks, all.

I didn’t realize a 7.0 RC was public and changed to 3.4—I will go down that road for now, though I’ll probably also look into integrating variants of the SPIR converter in the future.

Another possibility is to skip libnvvm altogether and use LLVM’s NVPTX target. This is of course harder since you have to configure the passes yourself instead of just calling a few C functions, but it does give you more control over the optimization pipeline and gives you full visibility into the compiler. Unfortunately, there are some NVVM-specific optimizations missing upstream that we are not able to contribute back.

I appreciate the suggestion, but this is actually for a project which has been using NVPTX for years. The problem is that we see (dramatically) better performance from our OpenCL backend (via CL C source kernels) on NVIDIA hardware simply because kernels actually get optimized through the full NVCC-style stack. Based on related experience in other projects, I expect NVVM to provide a similar or greater bump over NVPTX. In short, I’d love to stick with NVPTX and the open source stack in LLVM trunk, but until it provides competitive performance on real programs (which, in my experience so far, it ~never does), it’s unfortunately not a strong alternative.

Do you have some examples you can share? I would be interested in taking a look to see what we can do to help improve the performance of NVPTX, especially relative to equivalent OpenCL code.

Hi Justin,

We are working on a project that tries to bring OpenMP support for systems that comprise PPC and Nvidia GPUs, using the LLVM backends. We are currently working on the OpenMP version of clang maintained in github http://clang-omp.github.io/ which currently uses 3.5 IR level but in the process to start contributing our changes to trunk. We came across the exact same issue described here. I see you mentioned you not being able to contribute some NVVM specific changes back. Could you elaborate on that? Is that due to the confidential nature of the code or lack of time to implement and maintain these changes? If it is the latter we have in our team people interested in helping with that.

Many thanks,
Samuel