[PATCH] -emit-bitcode-version

Joe_Abbey · November 7, 2012, 11:51pm

Hello,

We have a tool which reads in bitcode, processes it, and re-emits it. We use this tool as a flexible way to integrate our tool into the Xcode, Android NDK, Chromium, and Linux build process.

The problem we face is that bitcode changes, and when it does… future versions can read it, but past versions are left in the lurch. For instance LLVM 3.2svn can BitcodeReader from LLVM 3.1, but LLVM 3.1 can’t BitcodeReader LLVM 3.2 (after r165739.) There was an element of this patch which would have helped enable bitcode compatibility (use-abs-operands), but alas it was not committed.

This patch is essentially those missing lines, but with a new purpose of providing a vehicle for bitcode compatibility. With this patch, I aim to enable clang-3.2 or beyond to produce a bitcode that llc 3.1 code read. Or when LLVM-3.5 creates a new encoding… LLVM-3.3 might have a chance of still reading it by disabling that feature.

There’s only two options right now:
0 ) which basically means absolute ids

which basically means relative ids.

I’m more than happy to document and maintain this actively, as we’ve been far too long passively monitoring and shifting with the changes.

So there’s a bunch missing, and I’m trying to figure out how to create a clang option which would configure the CurrentVersion.

Traditional testing will be a challenge, since it requires the use of an older llvm to verify, but I’m also willing to brainstorm on this. Perhaps I can create a custom build-slave on my buildbots to verify this on an on-going basis.

Cheers,

Joe

patch.txt (4.95 KB)

ATT00001.htm (216 Bytes)

Duncan_Sands · November 8, 2012, 8:31am

Hi Joe,

We have a tool which reads in bitcode, processes it, and re-emits it. We use
this tool as a flexible way to integrate our tool into the Xcode, Android NDK,
Chromium, and Linux build process.

The problem we face is that bitcode changes, and when it does… future versions
can read it, but past versions are left in the lurch. For instance LLVM 3.2svn
can BitcodeReader from LLVM 3.1, but LLVM 3.1 can't BitcodeReader LLVM 3.2
(after r165739.) There was an element of this patch which would have helped
enable bitcode compatibility (use-abs-operands), but alas it was not committed.

can't you use a combination of llvm-dis (from LLVM 3.2) and llvm-as (from LLVM
3.1) to convert the bitcode?

Ciao, Duncan.

Joe_Abbey · November 8, 2012, 3:09pm

I could for my immediate headache. My concern is that there will be future updates to the bitcode representation that may not work in this flow. That's impossible to predict, and llvm-dis -> .ll -> llvm-as is one path which certainly seems attractive. By providing an internal switch to emit bitcode, I think we save a step and have a method for future proofing.

I'll be talking about this in a lightning talk, perhaps others have thoughts on this.

Joe

Chris_Lattner · November 9, 2012, 9:55pm

Hi Joe,

As you mentioned in your talk, I'm sympathetic to your desires, but highly skeptical of this approach - for a number of reasons. Off the top of my head:

1. This will (over time) accrete a ton of old gunk in the bitcode writer, and also slow down progress.
2. The use case for it is also very narrow (in contrast to having the *reader* handle old files, which many scenarios benefit from).
3. The open source project as a whole benefits from "forcing" users of LLVM to "stay up" on mainline… which this feature acts in opposition to.

-Chris

Joe_Abbey · November 9, 2012, 10:35pm

The problem we face is that bitcode changes, and when it does… future versions
can read it, but past versions are left in the lurch. For instance LLVM 3.2svn
can BitcodeReader from LLVM 3.1, but LLVM 3.1 can't BitcodeReader LLVM 3.2
(after r165739.) There was an element of this patch which would have helped
enable bitcode compatibility (use-abs-operands), but alas it was not committed.

can't you use a combination of llvm-dis (from LLVM 3.2) and llvm-as (from LLVM
3.1) to convert the bitcode?

Ciao, Duncan.

I could for my immediate headache. My concern is that there will be future updates to the bitcode representation that may not work in this flow. That's impossible to predict, and llvm-dis -> .ll -> llvm-as is one path which certainly seems attractive. By providing an internal switch to emit bitcode, I think we save a step and have a method for future proofing.

I'll be talking about this in a lightning talk, perhaps others have thoughts on this.

Hi Joe,

As you mentioned in your talk, I'm sympathetic to your desires, but highly skeptical of this approach - for a number of reasons. Off the top of my head:

1. This will (over time) accrete a ton of old gunk in the bitcode writer, and also slow down progress.

3. The open source project as a whole benefits from "forcing" users of LLVM to "stay up" on mainline… which this feature acts in opposition to.

-Chris

Hi Chris!

Thanks for the reply. The talk has been quite beneficial in connecting me with Nico Weber and Stephen Hines on this problem space. Thanks again for enabling the community with the developers' conference, and trying out lightning talks.

1. This will (over time) accrete a ton of old gunk in the bitcode writer, and also slow down progress.

Well there's actually three layers of compatibility, possibly more.

1. The LLVM API

The LLVM API is simply a cost I have to eat every now and then when moving from version n to n+1. And after our leap from 2.8 -> 3.0 the API layer has been generally stable. A couple refactors here and there in recent days, like Attributes, but nothing major. I'm trying to assess our internal usage and ponder an API layer which abstracts LLVM from our tool. Still weighing pros and cons on that.

2. IR

Since 3.0 it's been very stable, there was a blip in 3.1 to 3.2 where switch records changed but overall not the problem I faced with 2.8 to 3.0. I don't see any way to provide IR compatibility. If it happens to change we'll just have to roll with those punches. Pretty sure even an llvm-dis to llvm-as fails us.

3. Bitcode encoding

This one is a bit of a head scratcher. The encoding offered 0 new features, and was a relatively arbitrary change. Yeah it saves 15% size, but it's completely optional. And for a couple users with unique use-cases it breaks reading 3.2 written code for not a whole lot of gain.

Turning this on and off doesn't add gunk or an unmaintainable path. Maybe in a few months we try something more comprehensive like Espindola's SLEB suggestion. What about the original use-abs-operands? Do you think that what more reasonably scope this?

And so properly scoping my proposal I see that I don't want IR compatibility as much as I want encoding options. So with that in mind maybe I need to go back to the drawing board and propose a BitcodeEncoder?BitcodeCompressor?

2. The use case for it is also very narrow (in contrast to having the *reader* handle old files, which many scenarios benefit from).

Yep can't argue that. And I really like that reader is backwards compatible.

3. The open source project as a whole benefits from "forcing" users of LLVM to "stay up" on mainline… which this feature acts in opposition to.

Ah, I hadn't considered that.

Thanks for taking the time to add your input to this. I'll go back to the drawing board with these points in mind.

Cheers

Joe

Chris_Lattner · November 10, 2012, 4:42pm

Thanks for the reply. The talk has been quite beneficial in connecting me with Nico Weber and Stephen Hines on this problem space. Thanks again for enabling the community with the developers' conference, and trying out lightning talks.

1. This will (over time) accrete a ton of old gunk in the bitcode writer, and also slow down progress.

Well there's actually three layers of compatibility, possibly more.

1. The LLVM API

The LLVM API is simply a cost I have to eat every now and then when moving from version n to n+1. And after our leap from 2.8 -> 3.0 the API layer has been generally stable. A couple refactors here and there in recent days, like Attributes, but nothing major. I'm trying to assess our internal usage and ponder an API layer which abstracts LLVM from our tool. Still weighing pros and cons on that.

Makes sense.

2. IR

Since 3.0 it's been very stable, there was a blip in 3.1 to 3.2 where switch records changed but overall not the problem I faced with 2.8 to 3.0. I don't see any way to provide IR compatibility. If it happens to change we'll just have to roll with those punches. Pretty sure even an llvm-dis to llvm-as fails us.

Yep, I do expect bitcode to settle down a bit now, except to support new IR features. I don't know about everyone else, but I'm personally done fiddling with the bitcode encoding just or the sake of it ;-). That was really the purpose of 3.0.

That said, there *are* new IR features being added. For example, the recent discussion on llvmdev about modeling "fast math" properly on floating point operations will require new fields in bitcode.

As you know, bitcode was intentionally designed to be extensible going forward, so this shouldn't be a problem. However, how would -emit-bitcode-version work if the IR is using some feature that cannot be encoded in the old format? Silently dropping the IR information doesn't seem like a good approach.

3. Bitcode encoding

This one is a bit of a head scratcher. The encoding offered 0 new features, and was a relatively arbitrary change. Yeah it saves 15% size, but it's completely optional. And for a couple users with unique use-cases it breaks reading 3.2 written code for not a whole lot of gain.

Turning this on and off doesn't add gunk or an unmaintainable path. Maybe in a few months we try something more comprehensive like Espindola's SLEB suggestion. What about the original use-abs-operands? Do you think that what more reasonably scope this?

And so properly scoping my proposal I see that I don't want IR compatibility as much as I want encoding options. So with that in mind maybe I need to go back to the drawing board and propose a BitcodeEncoder?BitcodeCompressor?

I can see where you're coming from here, this does seem like something that could be opt'ed into. However, this is the least common sort of change that we see in the bitcode files. Almost all of the churn we have is adding encodings for new IR features.

To be clear about this, I'm not strongly opposed to parameterizing the bitcode writer, but I want it to be done in a way that makes sense, and is likely to actually help going forward.

-Chris

Topic		Replies	Views
Clarification on the backward compatibility promises LLVM Dev List Archives	26	112	September 16, 2014
LLVM bc converter from LLVM 3.9 to LLVM 3.1 LLVM Dev List Archives	14	99	August 3, 2016
Errorifying the bitcode reader LLVM Dev List Archives	2	113	November 7, 2016
llvm 3.7 and llvm3.1 bitcode compatibility problem LLVM Dev List Archives	0	79	March 18, 2016
How to make sure the compatibility of LLVM IR LLVM Dev List Archives	2	77	June 5, 2014

[PATCH] -emit-bitcode-version

Related Topics