bitcode versioning

Hi all,

I am implementing a LLVM IR interpreter and have the following problem: I want to support execution of bitcode files targeted towards different LLVM versions. For example, a user of the interpreter should be able to compile a C file with the latest version of Clang, a Fortran file with Dragonegg (targeting LLVM 3.3), and a Haskell file with GHC (targeting LLVM 3.5), and then just feed it to my interpreter without additional arguments.

Currently, my parser expects textual representation for a specific LLVM version. I could provide different parsers or parser configurations that support different bitcode versions, but there is no notion of a version field in the textual representation that I could use to determine which parser to use. Anyway, for the long term it is not a good idea to rely on the textual format due to the missing backward compatibility guarantees.

Hence, I want to replace the textual format parser with a parser for bitcode, which would also be able to parse the files of my example. But how should I treat bitcode files of major upcoming releases, e.g., of LLVM 4.1? I found a version ID in the bitcode wrapper format, but the documentation states that the ID is currently always 0. Is there a policy that specifies when the ID will be updated? Without having such a policy in place, I would just postpone the problem I currently have with the textual format parser.

I know that bitcode files have not been designed for storage and distribution. Still, I wondered if there is a good solution to my problem. Do I really have to let the user specify which version a LLVM IR file targets?

  • Manuel

The wrapper format is Darwin specific AFAIK. However starting with 3.8 there will be a another version block in the bitcode, which contains a string identifying the producer and an integer that will be bumped when needed (whatever it means).
Look for lib//Bitcode/Reader/BitcodeReader.cpp:llvm::getBitcodeProducerString() as a starting point.

Is there going to be a formal interface/API for this version-block information? I have had to "extend" the IR and bitcode representations several times to address absences/limitations in the handling of various vector types, in particular FP16 vector types; and it would be really useful if I had a "standard" way of doing this, and identifying that my dialect was different.

Thanks,

  MartinO - Movidius

What kind of API would you expect? The Bitcode Reader expose the API to get the information in this block. It is up to the client to interpret it.

Our internal use case is to parse the version string and identify bitcode generated by an Apple released LLVM. If the version is “from the future” the bitcode can be rejected (we’ll do it during LTO).

Hi Mehdi and my apologies for the delay in responding - the day job got in the way :slight_smile:

Our target is still out-of-tree so my reasons for extending the IR would be eliminated if we were a proper part of LLVM, which I would like to do when the time is right for us.

My extensions are quite simple really, and I expect that they will be wanted in the TRUNK sometime anyway.

At the moment I only have one remaining change which is to add 'v16f16' to the set of IR types. Previously I had several other FP16 vector types added, but over the past few iterations of LLVM my changes have been gradually made redundant because others have added them formally to the source. I expect that 'v16f16' will go this way too allowing me to have an unaltered IR.

But the problem I have faced with making the changes, is that my LLVM cannot accept the BC produced by another version (and vice versa), not even the official version, because the placement of the types in the enumeration is very particular and changes the indices for all the subsequent values.

I had often thought it would be helpful if the BC (and LL for that matter) had a version resource of some kind, that would allow me to see that the incoming IR was produced by the official unchanged LLVM, and then I could have placed a translation in the loader that would remap the indices to the ones expect by my back-end.

When you proposed the addition of a version resource, I was thinking that rather than each target adding parsing code for it, it would be better and more transparent for it to appear as a "Version Resource Object" that I could query for simple things like:

  o Get the major number
  o Get the minor number
  o Get the patch number
  o Is it extended? and if "yes":
     - Get the vendor ID (could be a string)
     - Get the vendor specific extension number

And this is really what I mean by an API - essentially a simple object representing the version information. For IR production/emission, there would need to be a 'setter' interface too.

This would allow me to make my extensions, yet be in a position to more robustly accept BC or LL from other sources. In particular I should be able to remap IR coming from a well-known point-release of LLVM, and also be able to detect, diagnose and reject input from sources I don't recognise (at the moment it just causes a crash).

From my experience of developing an out-of-tree LLVM backend, I am painfully aware of the downsides of not being "in-tree", and while eventually I expect that I will be able to contribute our work, I am also aware that other future out-of-tree developers will run into similar kinds of problems in the future, and a formal version resource would greatly help.

Thanks,

  MartinO - Movidius Ltd.

Hi Mehdi and my apologies for the delay in responding - the day job got in the way :slight_smile:

Our target is still out-of-tree so my reasons for extending the IR would be eliminated if we were a proper part of LLVM, which I would like to do when the time is right for us.

My extensions are quite simple really, and I expect that they will be wanted in the TRUNK sometime anyway.

At the moment I only have one remaining change which is to add 'v16f16' to the set of IR types. Previously I had several other FP16 vector types added, but over the past few iterations of LLVM my changes have been gradually made redundant because others have added them formally to the source. I expect that 'v16f16' will go this way too allowing me to have an unaltered IR.

But the problem I have faced with making the changes, is that my LLVM cannot accept the BC produced by another version (and vice versa), not even the official version, because the placement of the types in the enumeration is very particular and changes the indices for all the subsequent values.

I had often thought it would be helpful if the BC (and LL for that matter) had a version resource of some kind, that would allow me to see that the incoming IR was produced by the official unchanged LLVM, and then I could have placed a translation in the loader that would remap the indices to the ones expect by my back-end.

When you proposed the addition of a version resource, I was thinking that rather than each target adding parsing code for it, it would be better and more transparent for it to appear as a "Version Resource Object" that I could query for simple things like:

o Get the major number
o Get the minor number
o Get the patch number

This would force a specific model for the version, which we didn’t want.

o Is it extended? and if "yes":
    - Get the vendor ID (could be a string)
    - Get the vendor specific extension number

And this is really what I mean by an API - essentially a simple object representing the version information. For IR production/emission, there would need to be a 'setter' interface too.

This is what we do, but using the string only.
The “setter” is compile time (LLVM_VERSION probably), we patch the bitcode write internally

This would allow me to make my extensions, yet be in a position to more robustly accept BC or LL from other sources. In particular I should be able to remap IR coming from a well-known point-release of LLVM, and also be able to detect, diagnose and reject input from sources I don't recognise (at the moment it just causes a crash).

The string content is predictable: if it will begin with “LLVM3.8.0” or “LLVM3.9.0”, etc. So you should be able to do exactly what you want.
The bitcode produced by our binaries has a very different string, and we use this information to identify the producer as well.
What’s missing?

Actually, I wasn't requesting additional functionality of a version resource, rather I was asking "if you were also providing an API to it". I guess the answer is that the proposal does not include a programmatic abstraction or API. This is not a problem, I can parse arbitrary strings easily enough.

> o Get the major number
> o Get the minor number
> o Get the patch number

This would force a specific model for the version, which we didn’t want.

These are just examples for illustrating my response, not intended as specific API requests - an API for the actual implemented version resource would of course provide its own notion of the content of the resource and would naturally derive from that implementation.

Thanks for clarifying this.

  MartinO