Clarification on the backward compatibility promises

A bit of history first:

Back when we transitioned from bytecode to bitcode (2.0) we had a tool
called llvm-upgrade which would read .ll files from 1.9 and output 2.0
format which could then be passed to llvm-as to produce bitcode.

The release notes for 2.3 note that llvm-upgrade was not supported any more.

During the 2.X development we tried to keep reading older bitcodes.
Once we got to 3.1, we dropped support for reading 2.X:

http://llvm.org/viewvc/llvm-project?view=revision&revision=145165

Since then we make sure that when IR format is changed we upgrade as
we read the bitcode and mark those code paths with something like "//
TODO: Remove with llvm 4.0".

There is some support for parsing .ll files, but we have not been
nearly as systematic about it.

So, what IR do we promise to be compatible with?

From the above history, it looks like the working assumptions are

* The is no strong guarantee about .ll compatibility. We don't make
gratuitous changes and when desirable/requested we do keep support for
old syntax around for some time and in some cases issue warnings about
it. In summary: it is fuzzy, don't assume much about it.

* All of llvm's 3.X and 4.0 releases will be able to read and upgrade
bitcode produced by all preceding 3.X releases (except for bugs).

* Once 4.0 is released, trunk may drop support for reading bitcode
produced by 3.X. We may then decide to keep some support, but we don't
offer any promises.

Do others agree that this is the case or at least that this would be a
reasonable balance?

In any case, we should probably document whatever we decide. Where
should that go? Sean suggested docs/DeveloperPolicy.html. Is everyone
OK with that?

Cheers,
Rafael

Hi Rafael,

Do others agree that this is the case or at least that this would be a
reasonable balance?

IMO it's easier to be compatible on .ll level, no? In case of binary
IR it's really easy to make incompatible changes. After all there are
no tests on IR binary compatibility, however the whole regression
testsuite can be viewed as a big test for .ll compatibility.

There are two more points here:

1. Actually we had much stronger policies wrt the bitcode
compatibility in minor releases. Something like x.y should be able to
read stuff from x.y-1, but x.y+2 is allowed not to read stuff there,
so the proper path is transition x.y-1 => x.y => x.y+2. Am I right?
2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

In any case, we should probably document whatever we decide. Where
should that go? Sean suggested docs/DeveloperPolicy.html. Is everyone
OK with that?

+1

Hi Rafael,

Do others agree that this is the case or at least that this would be a
reasonable balance?

IMO it's easier to be compatible on .ll level, no?

That is not my experience with the bitcode format. The way the API is
structured makes it really easy to support backwards compatibility.

It also seems a lot more valuable from an user perspective to support
reading old .bc files. It means they can keep a library with IR for an
entire major LLVM release for example.

In case of binary
IR it's really easy to make incompatible changes. After all there are
no tests on IR binary compatibility, however the whole regression
testsuite can be viewed as a big test for .ll compatibility.

We do have tests that are done by checking in old versions of bitcode
files. We didn't use to be good about it, but I think we are now
fairly systematic about it any time we change the format.

There are two more points here:

1. Actually we had much stronger policies wrt the bitcode
compatibility in minor releases. Something like x.y should be able to
read stuff from x.y-1, but x.y+2 is allowed not to read stuff there,
so the proper path is transition x.y-1 => x.y => x.y+2. Am I right?

That doesn't match what we have in trunk right now. For example, we
changed how inline asm is stored in r163185 (Sep 5 2012), but we still
support reading the old one. This is one of the cases where we have a
FIXME about 4.0.

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

In any case, we should probably document whatever we decide. Where
should that go? Sean suggested docs/DeveloperPolicy.html. Is everyone
OK with that?

+1

Cheers,
Rafael

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

-eric

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

I think we mean the same thing in the end. Since metadata can be
dropped, one particular way in which upgrade can be done for an old
format is to drop it. This would miss any optimization that the old
format was intended to allow, but we don't promise to keep
optimizations.

What would not be OK is for us to change the format (for example, make
range take a closed interval) and start miscompiling IR that had the
old format in it.

Debug metadata is special only in that we accept more than a missed
optimization when it is dropped, and I agree it should remain like
that for now.

Cheers,
Rafael

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

I think we mean the same thing in the end. Since metadata can be
dropped, one particular way in which upgrade can be done for an old
format is to drop it. This would miss any optimization that the old
format was intended to allow, but we don't promise to keep
optimizations.

What would not be OK is for us to change the format (for example, make
range take a closed interval) and start miscompiling IR that had the
old format in it.

*shrug* Before we lock down any metadata format like that I'd prefer
it was versioned.

Debug metadata is special only in that we accept more than a missed
optimization when it is dropped, and I agree it should remain like
that for now.

:slight_smile:

-eric

> Hi Rafael,
>
>> Do others agree that this is the case or at least that this would be a
>> reasonable balance?
> IMO it's easier to be compatible on .ll level, no?

That is not my experience with the bitcode format. The way the API is
structured makes it really easy to support backwards compatibility.

Could you elaborate more on this?

My anecdotal experience is that the .ll is more stable. I remember last
summer that in multiple situations passing the old .bc (from the 3.1-based
or 3.2-based SCE compiler IIRC) to trunk would cause it to barf, while
passing the .ll file would not barf. I don't think I would be able to
reproduce this without a lot of work, so I'm just leaving this as an
anecdote.

-- Sean Silva

Hi Rafael,

> Do others agree that this is the case or at least that this would be a
> reasonable balance?
IMO it's easier to be compatible on .ll level, no? In case of binary
IR it's really easy to make incompatible changes. After all there are
no tests on IR binary compatibility, however the whole regression
testsuite can be viewed as a big test for .ll compatibility.

This is a really good point. Theoretically (and to a good approximation in
practice), every feature of the IR is tested in the test suite in .ll form,
which means that a compatibility break at the .ll level will cause visible
churn. The same cannot be said for the bitcode.

-- Sean Silva

That is not my experience with the bitcode format. The way the API is
structured makes it really easy to support backwards compatibility.

Could you elaborate more on this?

My anecdotal experience is that the .ll is more stable. I remember last
summer that in multiple situations passing the old .bc (from the 3.1-based
or 3.2-based SCE compiler IIRC) to trunk would cause it to barf, while
passing the .ll file would not barf. I don't think I would be able to
reproduce this without a lot of work, so I'm just leaving this as an
anecdote.

Also anecdote, but since each "item" is output as a variable length
record, it is easy to append new members to the end. When we want to
make a more fundamental change, we can add a new enum value (like we
did for assembly).

Cheers,
Rafael

But we are also very liberal in changing the tests. I don't think we
should change that.

The main point is that it is probably not reasonable to ask users to
use old-llvm-dis | new-llvm-as when they upgrade. If we are to
provide backwords compatibility, it is far more user friendly to do it
at the bitcode level. Once we have that the .ll becomes the human
interface to LLVM only and we can make it as clean as possible.

Cheers,
Rafael

Well, I can only speak for myself, but in my two recent IR changes
(cmpxchg failure orderings and cmpxchg weak"), I preserved bitcode
compatiblity but not .ll. In both cases I was adding extra information
to the instruction, which meant the bitcode reading section just had
to insert a sensible default if that field wasn't present.

In the first case, I could have kept IR compatibility, but the second
(r210903) would have been rather difficult. It would involve examining
all uses of the cmpxchg to find out whether the type expected was
compatible with the new or the old version.

Cheers.

Tim.

>> >> Do others agree that this is the case or at least that this would be
a
>> >> reasonable balance?
>> > IMO it's easier to be compatible on .ll level, no?
>>
>> That is not my experience with the bitcode format. The way the API is
>> structured makes it really easy to support backwards compatibility.
>
> Could you elaborate more on this?
>
> My anecdotal experience is that the .ll is more stable. I remember last
> summer that in multiple situations passing the old .bc (from the
3.1-based
> or 3.2-based SCE compiler IIRC) to trunk would cause it to barf, while
> passing the .ll file would not barf. I don't think I would be able to
> reproduce this without a lot of work, so I'm just leaving this as an
> anecdote.

Well, I can only speak for myself, but in my two recent IR changes
(cmpxchg failure orderings and cmpxchg weak"), I preserved bitcode
compatiblity but not .ll. In both cases I was adding extra information
to the instruction, which meant the bitcode reading section just had
to insert a sensible default if that field wasn't present.

In the first case, I could have kept IR compatibility, but the second
(r210903) would have been rather difficult. It would involve examining
all uses of the cmpxchg to find out whether the type expected was
compatible with the new or the old version.

I briefly looked at r210903, and it seems like the reason that it was
easier to maintain the bitcode compatibility is that the size of the record
implicitly identifies whether the record was written before or after the
change. I.e., it would have been as though you had introduced a
"cmpxchgpossiblyweak" instruction that has the new return type, so that you
can tell which it is without extensive analysis.

At face value, this seems like a pretty compelling argument that the
bitcode is easier to version and keep compatible.

-- Sean Silva

Does anyone have anything else to say about .bc/.ll compatibility? It is important to be clear to users about what compatibility we provide. I’d like to get consensus about this and put it in the docs somewhere.

– Sean Silva

To make this all a bit easier, how about making clang/llvm output a
version number at the beginning of the .ll or .bc file?
It would then be clear which version of clang/llvm wrote the file out.
You could then add warnings if clang/llvm thought they were
incompatible with the older version or not.

Kind Regards

James

> Does anyone have anything else to say about .bc/.ll compatibility? It is
> important to be clear to users about what compatibility we provide. I'd
like
> to get consensus about this and put it in the docs somewhere.
>
> -- Sean Silva
>

To make this all a bit easier, how about making clang/llvm output a
version number at the beginning of the .ll or .bc file?
It would then be clear which version of clang/llvm wrote the file out.
You could then add warnings if clang/llvm thought they were
incompatible with the older version or not.

I don't think that would significantly help.

For .bc, the sizes of records already implicitly hold that information in a
more fine-grained and semantic way (not tied to versions, but rather
intrinsically to the change to the IR).
For .ll, it is too easy to end up with a file that has an nonexistent or
inconsistent version number (such as a hand-written file) so we wouldn't be
able to rely on it.

Anyway, that's sort of a separate discussion. For now, I'd like to focus
strictly on deciding what guarantees we offer.

-- Sean Silva

Hi Rafael,

Do others agree that this is the case or at least that this would be a
reasonable balance?

IMO it's easier to be compatible on .ll level, no?

That is not my experience with the bitcode format. The way the API is
structured makes it really easy to support backwards compatibility.

It also seems a lot more valuable from an user perspective to support
reading old .bc files. It means they can keep a library with IR for an
entire major LLVM release for example.

In case of binary
IR it's really easy to make incompatible changes. After all there are
no tests on IR binary compatibility, however the whole regression
testsuite can be viewed as a big test for .ll compatibility.

We do have tests that are done by checking in old versions of bitcode
files. We didn't use to be good about it, but I think we are now
fairly systematic about it any time we change the format.

There are two more points here:

1. Actually we had much stronger policies wrt the bitcode
compatibility in minor releases. Something like x.y should be able to
read stuff from x.y-1, but x.y+2 is allowed not to read stuff there,
so the proper path is transition x.y-1 => x.y => x.y+2. Am I right?

That doesn't match what we have in trunk right now. For example, we
changed how inline asm is stored in r163185 (Sep 5 2012), but we still
support reading the old one. This is one of the cases where we have a
FIXME about 4.0.

My understanding is newer version of LLVM should be able read older version with the same major release. And the .0 of the new major release must be able to read the bitcode of the previous major release. I think this is the right policy. We haven’t done a good job enforcing the policy, but we should.

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

I agree.

Evan

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

We may need to rethink this. If metadata is used only as optimization / codegen hints, then yes I agree they can be dropped. But I suspect there is a need for metadata that’s *required* for correctness. As LLVM continues to gain clients beyond “just” compilers, we will need to be sensitive to their needs. I anticipate use of LLVM bitcode files as persistent object format.

Evan

I think that metadata that's required for correctness should be baked
into the IR and not be metadata - so if there's something we need for
correctness we need to come up with an IR extension. See the recent
comdat work as an example.

Sadly, I agree with you that bitcode might be a persistent object
format - I'd personally want an LLVM 3.0 that came up with a better
format though.

-eric

That’s not really a practical suggestion for clients that aren’t essentially clang. The bar to changing the IR is (correctly) very high, essentially unreachable if the client is out-of-tree.

—Owen

Sure, but they likely have their own metadata format with their own
needs and can keep their own local patches for their out of tree
extensions right? As far as I know we don't have any metadata
extensions in tree that are required for any correctness. If so,
they've explicitly gone against the rules we set for metadata a long
time back:

http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html

Unless I'm missing your point completely of course :slight_smile:

-eric