Using the unused "version" field in the bitcode wrapper (redux)

Hi all,

Doug Yung started a discussion earlier (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077227.html) about using the unused “version” field in the bitcode wrapper, and I think there was some misunderstanding. I’d like to clarify the motivation.

The reason we want to add the version field is to easily identify “old” bitcode. It is only LLVM version granularity, but that is good enough for us. The obvious thing is that we offer compatibility, so why do we need to know the bitcode version? There are really two reasons:

  1. In the short term, LLVM does theoretically provide compatibility. However, we at Sony are under a much stronger commitment to our customers than the open source project here, so until the test infrastructure is beefed up quite a bit to improve this confidence in the backwards compatibility promise, the version field is a quick unobtrusive way for us to behave correctly in the short term. Beefing up the compatibility testing is a separate discussion that everybody realizes is a much larger long-term endeavor; we at Sony are glad to help with that. As we prepare for our first official SDK release with LTO, where our customers are officially sanctioned to feed bitcode to our tools, a solution is needed though. I think that existing toolchains with LTO will also benefit from this in the near-term.

  2. In the long term, it will break across major releases. Currently I don’t think we have a way to identify this. This is a bit longer-term perspective, but it will eventually come up and this fixes it.

Also, is there a reason why the bitcode wrapper is Darwin-specific? Can we just always use it and simplify the code path?

– Sean Silva

Hi all,

Doug Yung started a discussion earlier (
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077227.html)
about using the unused "version" field in the bitcode wrapper, and I think
there was some misunderstanding. I'd like to clarify the motivation.

The reason we want to add the version field is to easily identify "old"
bitcode. It is only LLVM version granularity, but that is good enough for
us. The obvious thing is that we offer compatibility, so why do we need to
know the bitcode version? There are really two reasons:

1. In the short term, LLVM does theoretically provide compatibility.
However, we at Sony are under a much stronger commitment to our customers
than the open source project here, so until the test infrastructure is
beefed up quite a bit to improve this confidence in the backwards
compatibility promise, the version field is a quick unobtrusive way for us
to behave correctly in the short term. Beefing up the compatibility testing
is a separate discussion that everybody realizes is a much larger long-term
endeavor; we at Sony are glad to help with that. As we prepare for our
first official SDK release with LTO, where our customers are officially
sanctioned to feed bitcode to our tools, a solution is needed though. I
think that existing toolchains with LTO will also benefit from this in the
near-term.

2. In the long term, it *will* break across major releases. Currently I
don't think we have a way to identify this. This is a bit longer-term
perspective, but it will eventually come up and this fixes it.

Also, is there a reason why the bitcode wrapper is Darwin-specific?

Rafael gave me some of the backstory on this. Basically it is to work
around some buggy behavior in the Darwin ar. Adding that on the front of
the bitcode file just to get a version doesn't seem like a very clean thing
to do.

Doug, what other alternatives did you guys consider before settling on this?

As for #2 above, the non-universality of the wrapper makes using the
wrapper as a version indicator sort of a non-starter.

Looks like I've taken my second U-turn on this proposal :confused:

Hi Sean,

Rafael gave me some of the backstory on this. Basically it is to work around some buggy behavior in the Darwin ar. Adding that on the front of the bitcode file just to get a version doesn’t seem

like a very clean thing to do.

Doug, what other alternatives did you guys consider before settling on this?

As for #2 above, the non-universality of the wrapper makes using the wrapper as a version indicator sort of a non-starter.

Using the bitcode wrapper was our main idea as we discovered it fairly early one when looking for ways to store the version information. Originally we wanted to propose adding extra fields at the end of the bitcode wrapper, but we found that we only needed the version to be embedded by the compiler, and the wrapper already included an unused version field, so we were hoping to use that. We also liked that the existing bitcode wrapper was documented, implemented and already worked with most if not all of the existing LLVM tools.

One alternative that was suggested was we could create a version section in the bitcode that the compiler would then embed the version into. But we did not want to take that approach because it would have required the linker to parse the bitcode to just to extract the version. That functionality is not available (to my knowledge) in our proprietary linker. By using the bitcode wrapper, our linker is easily able to extract the version and find the bitcode without needing to understand bitcode at all.

The non-universality of the wrapper could be an issue, but we were going to work around that by forcing our compiler to always produce the bitcode wrapper for our target, so in our case that would not be an issue. If versioning of bitcode files is something that might be useful to other teams, it might make sense to just apply it on all bitcode files produced by the compiler. In essence, make it required instead of optional as it currently stands. The bitcode wrapper would only add 20 bytes to the size of the file and most LLVM tools should be able to handle them without any changes.

Douglas Yung

Jumping late at these threads, but my opinion with my open source hat on is:

* We want to support backwards compatibility in the bitcode file. In
the past, we have not done a good job at it, but if someone wants to
do the extra testing, any issues found would be considered real bugs
and if reported are very likely to be fixed.

* The backwards compatibility for debug info is far less strict, but
we already have a version info for that. We currently drop it on
mismatch, but it would be trivial to have an option to error instead.

* When we get to 4.1 we might want to drop compatibility with some
older bitcodes from the 3.X era. If we then also want to be able to
reuse enum values (linkage for example), adding a marker/version when
we get to 4.0 might make sense, but it would include only the major
version.

* For the 4.1 transition we would have to detect any bitcode files, so
using the wrapper would not be sufficient, we would need to have the
version/marker somewhere in the main bitcode.

Tanks to Sean for getting me thinking about the 4.0 to 4.1 transition.

Jumping late at these threads, but my opinion with my open source hat on
is:

* We want to support backwards compatibility in the bitcode file. In
the past, we have not done a good job at it, but if someone wants to
do the extra testing, any issues found would be considered real bugs
and if reported are very likely to be fixed.

Sure.

* The backwards compatibility for debug info is far less strict, but
we already have a version info for that. We currently drop it on
mismatch, but it would be trivial to have an option to error instead.

A diagnostic that cites the version number from the info we're dropping
would be vastly superior to "ewww, old debug data, take it away!"

* When we get to 4.1 we might want to drop compatibility with some
older bitcodes from the 3.X era. If we then also want to be able to
reuse enum values (linkage for example), adding a marker/version when
we get to 4.0 might make sense, but it would include only the major
version.

Is there some compelling reason not to put a version number in now?
Assuming we'll remember to add it N years from now at 4.0 seems like
totally asking for trouble.

* For the 4.1 transition we would have to detect any bitcode files, so
using the wrapper would not be sufficient, we would need to have the
version/marker somewhere in the main bitcode.

Again, reporting the too-old version number in the diagnostic is best.
My fear is that currently the diagnostic is something like "corrupt
bitcode file" which is completely unacceptable.
--paulr

Jumping late at these threads, but my opinion with my open source hat on is:

* We want to support backwards compatibility in the bitcode file. In
the past, we have not done a good job at it, but if someone wants to
do the extra testing, any issues found would be considered real bugs
and if reported are very likely to be fixed.

I agree with Rafael on this. The hard part here is to do the testing: once problems are identified, it is straight-forward to do the auto-upgrading logic. If you don’t plan to do the auto-upgrade, then we’ve failed in our mission and using a version number doesn’t seem like much to feel good about.

* The backwards compatibility for debug info is far less strict, but
we already have a version info for that. We currently drop it on
mismatch, but it would be trivial to have an option to error instead.

Right. There is also work underway to improve debug information in a number of ways, these changes should also help improve its stability.

* When we get to 4.1 we might want to drop compatibility with some
older bitcodes from the 3.X era. If we then also want to be able to
reuse enum values (linkage for example), adding a marker/version when
we get to 4.0 might make sense, but it would include only the major
version.

Honestly, I think we should evaluate it when we get there. In the 2.x to 3.0 era, there was a huge benefit from dropping the 2.x compatibility junk, given that bitcode was new in the 2.x timeframe. I think it is good that we’re reserving the right to drop compatibility when 4.0 rolls around, but unless there is a major win, we should consider keeping 3.x support in the compiler.

-Chris

Jumping late at these threads, but my opinion with my open source hat on is:

* We want to support backwards compatibility in the bitcode file. In
the past, we have not done a good job at it, but if someone wants to
do the extra testing, any issues found would be considered real bugs
and if reported are very likely to be fixed.

I agree with Rafael on this. The hard part here is to do the testing: once problems are identified, it is straight-forward to do the auto-upgrading logic. If you don’t plan to do the auto-upgrade, then we’ve failed in our mission and using a version number doesn’t seem like much to feel good about.

* The backwards compatibility for debug info is far less strict, but
we already have a version info for that. We currently drop it on
mismatch, but it would be trivial to have an option to error instead.

Right. There is also work underway to improve debug information in a number of ways, these changes should also help improve its stability.

* When we get to 4.1 we might want to drop compatibility with some
older bitcodes from the 3.X era. If we then also want to be able to
reuse enum values (linkage for example), adding a marker/version when
we get to 4.0 might make sense, but it would include only the major
version.

Honestly, I think we should evaluate it when we get there. In the 2.x to 3.0 era, there was a huge benefit from dropping the 2.x compatibility junk, given that bitcode was new in the 2.x timeframe. I think it is good that we’re reserving the right to drop compatibility when 4.0 rolls around, but unless there is a major win, we should consider keeping 3.x support in the compiler.

-Chris

Hi,

The conversation has drifted slightly, so I wanted to bring it back to the version field in the bitcode wrapper.

Currently in the toolchain which we ship and support, we use a proprietary linker. That linker is unable to read bitcode files and we do not have any plans to enable it to as far as I’m aware. Because of this, we need a way of identifying the version of a bitcode file without needing to read/understand the bitcode itself. The bitcode wrapper is perfectly suited for this task as it is simple to parse without the linker needing to understand bitcode, is already defined and already includes a version field.

If we could get the compiler to use that version field that is already present, it would be a simple solution to our problem. So I guess my question is whether there is any objection to actually using a field which is already allocated for this kind of information?

Douglas Yung

Hi,

The conversation has drifted slightly, so I wanted to bring it back to the
version field in the bitcode wrapper.

Currently in the toolchain which we ship and support, we use a proprietary
linker. That linker is unable to read bitcode files and we do not have any
plans to enable it to as far as I’m aware. Because of this, we need a way
of identifying the version of a bitcode file without needing to
read/understand the bitcode itself.

You haven't established that you really need this. AFAIK Apple's linker
doesn't need this version information and they have shipped LTO for a while
now.

The bitcode wrapper is perfectly suited for this task as it is simple to
parse without the linker needing to understand bitcode, is already defined
and already includes a version field.

If we could get the compiler to use that version field that is already
present, it would be a simple solution to our problem.

What is this "problem"?

So I guess my question is whether there is any objection to actually using
a field which is already allocated for this kind of information?

That is not what is being discussed. We're discussing the forest and not
the trees; this patch implies a whole lot of stuff.

-- Sean Silva

  • The backwards compatibility for debug info is far less strict, but
    we already have a version info for that. We currently drop it on
    mismatch, but it would be trivial to have an option to error instead.

Right. There is also work underway to improve debug information in a number of ways, these changes should also help improve its stability.

From a textual perspective, I don’t know from a binary format perspective. But it can’t hurt either :slight_smile:

-eric

Currently in the toolchain which we ship and support, we use a proprietary
linker. That linker is unable to read bitcode files and we do not have any
plans to enable it to as far as I’m aware. Because of this, we need a way of
identifying the version of a bitcode file without needing to read/understand
the bitcode itself. The bitcode wrapper is perfectly suited for this task as
it is simple to parse without the linker needing to understand bitcode, is
already defined and already includes a version field.

There are a few issues in this statement:

First: it is very specific to the current state of a specific toolchain.

Second: even in the specific case mentioned, the conclusion that the
version must readable without understanding bitcode seems a non
sequitur. The gold linker doesn't link with llvm and ld64 does so only
indirectly. The entire LLVM specific knowledge is placed in
LLVMgold.so or libLTO.dylib. Any linker that supports LTO with LLVM
will have a component or related tool that knows LLVM and could check
version information and raise the alarm. It must be able to report an
error because two GlobalVariables could not be merged, why is
producing an error on the version harder?

Third: even the conclusion that a version is needed seems a non
sequitur. The current policy gives us the *option* of dropping support
for old bitcode once we get to 4.1. That means that *if* we decide to
drop something, *and* that something cannot be easily identified by
the bitcode structure itself, *then* we will need to add a version
number in 4.0.

There can always be bugs, but we should just test for compatibility
and fix when issues are found.

If we could get the compiler to use that version field that is already
present, it would be a simple solution to our problem. So I guess my
question is whether there is any objection to actually using a field which
is already allocated for this kind of information?

As noted, the backwards compatibility guarantees that we offer require
all bitcode formats to be treated equally, so depending of the wrapper
for this is not a good idea.

Cheers,
Rafael

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On Behalf Of Sean Silva

You haven't established that you really need this. AFAIK Apple's linker
doesn't need this version information and they have shipped LTO for a
while now.

Does Apple support library/middleware providers shipping bitcode instead
of object code? That's the most nervous-making scenario for compatibility.
LTO by itself needs bitcode only as an ephemeral stage between source and
object; it's supporting bitcode as a long-lived on-disk format that keeps
us awake at night.

I acknowledge the compatibility promise but I've been whacked upside the
head too often by QA over the years to take an unverified promise at
face value. I would like worked examples and industry experience reports.
--paulr

I've been bitten by trying to do this sort of thing recently. The released LLVM 3.4 and 3.5, and the Xcode 5.x tools, generate debug info marked with the module flag:

     !49 = metadata !{i32 2, metadata !"Debug Info Version", i32 1}

The tools included with Xcode 6.x generate debug info marked as:

     !49 = metadata !{i32 2, metadata !"Debug Info Version", i32 600054001}

and will discard debug info marked with any other version.

I had originally hoped that our (Open Dylan) compiler could output bitcode that people could compile and link using the standard /usr/bin/clang installed with Xcode, but it's looking more like we may want to provide users with a build of vanilla LLVM/Clang for this platform instead.

Hopefully the debug info improvements currently under way will incorporate a compatibility guarantee that Apple-distributed tools will stick to as well.

-Peter-

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On Behalf Of Sean Silva

You haven’t established that you really need this. AFAIK Apple’s linker
doesn’t need this version information and they have shipped LTO for a
while now.
Does Apple support library/middleware providers shipping bitcode instead
of object code? That’s the most nervous-making scenario for compatibility.
LTO by itself needs bitcode only as an ephemeral stage between source and
object; it’s supporting bitcode as a long-lived on-disk format that keeps
us awake at night.

I acknowledge the compatibility promise but I’ve been whacked upside the
head too often by QA over the years to take an unverified promise at
face value. I would like worked examples and industry experience reports.
–paulr

I’ve been bitten by trying to do this sort of thing recently. The
released LLVM 3.4 and 3.5, and the Xcode 5.x tools, generate debug info
marked with the module flag:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 1}

The tools included with Xcode 6.x generate debug info marked as:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 600054001}

This is an apple bug. You should report it.

and will discard debug info marked with any other version.

This is correct behavior.

I had originally hoped that our (Open Dylan) compiler could output
bitcode that people could compile and link using the standard
/usr/bin/clang installed with Xcode, but it’s looking more like we may
want to provide users with a build of vanilla LLVM/Clang for this
platform instead.

Hopefully the debug info improvements currently under way will
incorporate a compatibility guarantee that Apple-distributed tools will
stick to as well.

Unlikely for now though I don’t speak for Apple.

-eric

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On Behalf Of Sean Silva

You haven’t established that you really need this. AFAIK Apple’s linker
doesn’t need this version information and they have shipped LTO for a
while now.
Does Apple support library/middleware providers shipping bitcode instead
of object code? That’s the most nervous-making scenario for compatibility.
LTO by itself needs bitcode only as an ephemeral stage between source and
object; it’s supporting bitcode as a long-lived on-disk format that keeps
us awake at night.

I acknowledge the compatibility promise but I’ve been whacked upside the
head too often by QA over the years to take an unverified promise at
face value. I would like worked examples and industry experience reports.
–paulr

I’ve been bitten by trying to do this sort of thing recently. The
released LLVM 3.4 and 3.5, and the Xcode 5.x tools, generate debug info
marked with the module flag:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 1}

The tools included with Xcode 6.x generate debug info marked as:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 600054001}

This is an apple bug. You should report it.

No, it is intentional. Because our releases are not aligned with any LLVM release, we’re using debug version numbers that are completely different from trunk.

and will discard debug info marked with any other version.

This is correct behavior.

I had originally hoped that our (Open Dylan) compiler could output
bitcode that people could compile and link using the standard
/usr/bin/clang installed with Xcode, but it’s looking more like we may
want to provide users with a build of vanilla LLVM/Clang for this
platform instead.

Hopefully the debug info improvements currently under way will
incorporate a compatibility guarantee that Apple-distributed tools will
stick to as well.

Unlikely for now though I don’t speak for Apple.

We would love to have our debug info be compatible with the open-source version. We would need a stable representation of debug info for that to be possible.

This is an apple bug. You should report it.

No, it is intentional. Because our releases are not aligned with any LLVM release, we’re using debug version numbers that are completely different from trunk.

I’m talking to adrian about it. It’s still a bug IMO, but it’s up to you how you want to version anything in your releases.

Unlikely for now though I don’t speak for Apple.

We would love to have our debug info be compatible with the open-source version. We would need a stable representation of debug info for that to be possible.

I disagree, but I’m also unwilling to debate it any further as I don’t have time.

-eric

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu] On Behalf Of Sean Silva

You haven’t established that you really need this. AFAIK Apple’s linker
doesn’t need this version information and they have shipped LTO for a
while now.
Does Apple support library/middleware providers shipping bitcode instead
of object code? That’s the most nervous-making scenario for compatibility.
LTO by itself needs bitcode only as an ephemeral stage between source and
object; it’s supporting bitcode as a long-lived on-disk format that keeps
us awake at night.

I acknowledge the compatibility promise but I’ve been whacked upside the
head too often by QA over the years to take an unverified promise at
face value. I would like worked examples and industry experience reports.
–paulr

I’ve been bitten by trying to do this sort of thing recently. The
released LLVM 3.4 and 3.5, and the Xcode 5.x tools, generate debug info
marked with the module flag:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 1}

The tools included with Xcode 6.x generate debug info marked as:

!49 = metadata !{i32 2, metadata !“Debug Info Version”, i32 600054001}

Apple clang uses a Debug Info Version that is derived from the clang version number (600.54.1 in this case). Apple clang’s release cycle is decoupled from the LLVM release cycle, so it is intentionally using a completely different range for the Debug Info Version number.

– adrian

Realized that might have sounded harsher than intended. I’m always willing to chat about it, but I understand your reasoning - it’s just not how I’d do it and I don’t think it’ll end up mattering for the project anyhow other than some bug reports about clang vs xcode. :slight_smile:

-eric

No.

-Chris

>> From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Sean Silva
>>
>> You haven't established that you really need this. AFAIK Apple's linker
>> doesn't need this version information and they have shipped LTO for a
>> while now.
>
> Does Apple support library/middleware providers shipping bitcode instead
> of object code?

No.

Are there ever any plans to do so?
(this question also goes out to every other vendor that is shipping an LTO
toolchain or plans to. Chad?)

I'm just trying to figure out how much of a Sony-specific issue this is.
Our customers are very performance-hungry and so we would like to provide
the option to middleware vendors (e.g. a physics library). If there is a
strong general desire in the community to not support this use case, we
will want to factor this into our internal decisions.

-- Sean Silva