Clarification on the backward compatibility promises

My response as someone with such an out of tree client: that’s their problem. We should consider adding obvious (and merge safe!) extension points, but we have no responsibility to preserve out of tree semantics which explicitly go against stated guidelines. Also, while the bar for changing the IR is high, I haven’t actually seen that many well thought out proposals rejected. I have also not seen many rejected without solid reasoning and a recommendation for a preferred approach. Philip

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

We may need to rethink this. If metadata is used only as optimization /
codegen hints, then yes I agree they can be dropped. But I suspect there is
a need for metadata that’s *required* for correctness. As LLVM continues to
gain clients beyond “just” compilers, we will need to be sensitive to their
needs. I anticipate use of LLVM bitcode files as persistent object format.

I think that metadata that's required for correctness should be baked
into the IR and not be metadata - so if there's something we need for
correctness we need to come up with an IR extension. See the recent
comdat work as an example.

That’s not really a practical suggestion for clients that aren’t essentially
clang. The bar to changing the IR is (correctly) very high, essentially
unreachable if the client is out-of-tree.

Sure, but they likely have their own metadata format with their own
needs and can keep their own local patches for their out of tree
extensions right? As far as I know we don't have any metadata
extensions in tree that are required for any correctness. If so,
they've explicitly gone against the rules we set for metadata a long
time back:

http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html

Unless I'm missing your point completely of course :slight_smile:

I don’t disagree with this. I am only cautioning against taking an absolutely hardline attitude towards metadata compatibility. As long as we take a pragmatic approach by providing ways for clients to maintain backward compatibility, I don’t think anyone will have problems with the stated policy. We will need to be very careful if / when we propose fundamental changes.

Evan

I'm not inclined to change any metadata consumer just for kicks, and
if we want to change how metadata itself works then that should be a
more reasoned and careful change.

I.e. I see a user of metadata as something that we don't guarantee
compatibility for, but the metadata system itself I can see as a more
formal IR construct.

Make sense?

-eric

An addendum, versioning metadata once you've got a stable format would
probably be a good idea if you want to worry about future
compatibility. As an example, I can see the loop metadata getting a
version that can be easily read by a consumer and "I can't cope with
this" can just mean it can be ignored etc.

However, that's a maintainability/backward compatibility thing for
each metadata producer to deal with - I removed it from debug info
because we were changing so fast that it was a significant overhead.
That said, if someone wants to change a metadata format to be more
efficient or expressive then it should have a significantly lower bar
than an IR level change.

-eric

2. Metadata compatibility. We already had precedence of introducing
incompatible changes into metadata format in the past within release.
Should we use relaxes rules for metadata compatibility?

I think we have a special case for debug metadata (and should document
that), but otherwise I think we should hold metadata to the same
standard as the rest of the IR.

The idea with metadata is that it can be removed and everything still
works. I'm definitely not ready to lock down the debug metadata format
and I really don't think we should for any of the other uses since
stripping already works. (Note, I don't consider function attributes
etc as metadata)

We may need to rethink this. If metadata is used only as optimization /
codegen hints, then yes I agree they can be dropped. But I suspect there is
a need for metadata that’s *required* for correctness. As LLVM continues to
gain clients beyond “just” compilers, we will need to be sensitive to their
needs. I anticipate use of LLVM bitcode files as persistent object format.

I think that metadata that's required for correctness should be baked
into the IR and not be metadata - so if there's something we need for
correctness we need to come up with an IR extension. See the recent
comdat work as an example.

That’s not really a practical suggestion for clients that aren’t essentially
clang. The bar to changing the IR is (correctly) very high, essentially
unreachable if the client is out-of-tree.

Sure, but they likely have their own metadata format with their own
needs and can keep their own local patches for their out of tree
extensions right? As far as I know we don't have any metadata
extensions in tree that are required for any correctness. If so,
they've explicitly gone against the rules we set for metadata a long
time back:

http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html

Unless I'm missing your point completely of course :slight_smile:

I don’t disagree with this. I am only cautioning against taking an absolutely hardline attitude towards metadata compatibility. As long as we take a pragmatic approach by providing ways for clients to maintain backward compatibility, I don’t think anyone will have problems with the stated policy. We will need to be very careful if / when we propose fundamental changes.

I'm not inclined to change any metadata consumer just for kicks, and
if we want to change how metadata itself works then that should be a
more reasoned and careful change.

I.e. I see a user of metadata as something that we don't guarantee
compatibility for, but the metadata system itself I can see as a more
formal IR construct.

Make sense?

Yes, this sounds generally reasonable.

evan

That was also my understanding of our policy.

You are right that we need to do a better job of testing this. We have some ad-hoc tests for particular bitcode auto-upgrades but nothing systematic. These tests just read binary bitcode files, run them through llvm-dis, and check for the expected output. An initial step to improve the situation would be to write similar tests that attempt to exercise every IR feature, not just the things that we already know need to be upgraded from old bitcode. We would need to adopt the convention of extending these tests whenever something is added to the IR. Whenever we make a release, we would make a copy this set of tests corresponding to that release and we would continue to run those tests on trunk, e.g., reading the bitcode from the 3.5 release. The expected output for these tests might need to be updated occasionally. I would want to try to pack these systematic tests into as few files as possible, ideally only one, so that it is easier to make copies as part of the release process. I don’t know if that is feasible, especially for checking target-specific intrinsics, but it would be a nice goal.

Any feedback on that proposal?

Hi Rafael,

Do others agree that this is the case or at least that this would be a
reasonable balance?

IMO it's easier to be compatible on .ll level, no?

That is not my experience with the bitcode format. The way the API is
structured makes it really easy to support backwards compatibility.

It also seems a lot more valuable from an user perspective to support
reading old .bc files. It means they can keep a library with IR for an
entire major LLVM release for example.

In case of binary
IR it's really easy to make incompatible changes. After all there are
no tests on IR binary compatibility, however the whole regression
testsuite can be viewed as a big test for .ll compatibility.

We do have tests that are done by checking in old versions of bitcode
files. We didn't use to be good about it, but I think we are now
fairly systematic about it any time we change the format.

There are two more points here:

1. Actually we had much stronger policies wrt the bitcode
compatibility in minor releases. Something like x.y should be able to
read stuff from x.y-1, but x.y+2 is allowed not to read stuff there,
so the proper path is transition x.y-1 => x.y => x.y+2. Am I right?

That doesn't match what we have in trunk right now. For example, we
changed how inline asm is stored in r163185 (Sep 5 2012), but we still
support reading the old one. This is one of the cases where we have a
FIXME about 4.0.

My understanding is newer version of LLVM should be able read older version with the same major release. And the .0 of the new major release must be able to read the bitcode of the previous major release. I think this is the right policy. We haven’t done a good job enforcing the policy, but we should.

That was also my understanding of our policy.

You are right that we need to do a better job of testing this. We have some ad-hoc tests for particular bitcode auto-upgrades but nothing systematic. These tests just read binary bitcode files, run them through llvm-dis, and check for the expected output. An initial step to improve the situation would be to write similar tests that attempt to exercise every IR feature, not just the things that we already know need to be upgraded from old bitcode. We would need to adopt the convention of extending these tests whenever something is added to the IR. Whenever we make a release, we would make a copy this set of tests corresponding to that release and we would continue to run those tests on trunk, e.g., reading the bitcode from the 3.5 release. The expected output for these tests might need to be updated occasionally.

This is only a small burden, since it's just one more spot to update on
an assembly change.

I would want to try to pack these systematic tests into as few files as possible, ideally only one, so that it is easier to make copies as part of the release process. I don’t know if that is feasible, especially for checking target-specific intrinsics, but it would be a nice goal.

Any feedback on that proposal?

This makes sense to me.