Proposal: Managing ABI changes in libc++

In general, we try to avoid making changes to the ABI for libc++.
ABI changes can lead to subtle, hard to find bugs, when part of a piece of software (a dylib or static library, say) is build to the old ABI, and the rest to the new ABI. People have been burned in the past by inadvertent changes to the libc++ ABI. (not to be confused with the libc++abi project)

Eric Fiselier has been working on a tool to detect ABI changes, so that (hopefully) all future changes will be intentional.

ABI-breaking changes can include things like:
  * Changes to structures (sizes, layout)
  * Addition/removal of virtual functions (vtable layouts)
  * Changes to template parameters (addition, removal)

Also, there are times that a change to the standard will mandate an ABI change. I tend to argue against those in the committee meetings, but I don’t always get my way.

In the LLVM community, there are two differing opinions about changes in lib++ that are ABI-breaking. Broadly speaking:

a) There are the people who ship libc++ in production systems, who say: Whoa! Don’t do that! Ever! (or at least “let us decide when”).

b) There are the people who use libc++ internally, who say: Is it faster? Does it work better? Do it!

=== Proposal ===

Goals:
1) Make the default be “ABI is stable” (modulo changes in the C++ standard)
2) Make it possible for people to propose (and use) ABI-breaking changes for libc++, and have them live in tree.
Note: This would make it possible, not trivial. We still want to avoid gratuitously changing the ABI.

Concrete steps:
1) Give each ABI-breaking change its own "enabling macro”, that starts with “_LIBCPP_ABI_”

We have an example of this today. There are two different std::basic_string layouts defined in <string>, and the
second (ABI changing) one is controlled by the macro _LIBCPP_ALTERNATE_STRING_LAYOUT

Under my proposal, I would change this to _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT, and keep the old name as a synonym.

2) Create a global macro “_LIBCPP_ABI_UNSTABLE” which, when defined, turns on ALL of the _LIBCPP_ABI_* changes.

Adding a new, ABI-incompatible change to the library would consist of:
* Choosing an enabling macro name of the form _LIBCPP_ABI_XXXXXXX
* Wrapping the code in #ifdef _LIBCPP_ABI_XXXXXXX
* Enabling the macro if _LIBCPP_ABI_UNSTABLE is defined.

I think that this convention will make it possible both camps ((a) and (b) above) to coexist in the same code base.

Comments?

— Marshall

P.S. There are currently three bugs/feature request in the bug tracker that would require ABI changes to implement.

* Bug 17980 - Container<Incomplete Type> support
* Bug 19153 - Some iterators non-standard
* Bug 21715 - 128-bit integers printing not supported in stl implementation

Hi Marshall,

I agree this is a good idea...

a) There are the people who ship libc++ in production systems, who say: Whoa! Don’t do that! Ever! (or at least “let us decide when”).
b) There are the people who use libc++ internally, who say: Is it faster? Does it work better? Do it!

And there are those that accept each one on different occasions. :slight_smile:

Concrete steps:
1) Give each ABI-breaking change its own "enabling macro”, that starts with “_LIBCPP_ABI_”

This sounds reasonable.

2) Create a global macro “_LIBCPP_ABI_UNSTABLE” which, when defined, turns on ALL of the _LIBCPP_ABI_* changes.

While this can be problematic. What if group A wants ABI-breaking
changes X and Y, while group B want Z? In the beginning, we could all
all of them under the same umbrella, but as requests come in, we could
end up with a set of sets, unions and intersections of macros, which
are never pretty. Brainstorming a bit, to avoid intersections and
separate sets, we could make it into a severity scale (like unstable
-> broken -> napalm). But I'm over-engineering, as usual.

Having said that, I think the simple all or nothing will be perfect
for 99% of the cases, so I think this could work very well in
practice. We just have to make sure we test both variations as often
as possible.

cheers,
--renato

Marshall,

I’m glad you are thinking about this and working through a design!

Have you considered using mangling to keep ABI changes separate? That is, mangle the class or namespace differently (perhaps __2 instead of __1) for new ABIs. That way no one case accidentally link two different ABIs.

-Nick

Nick —

I’ve done some thinking along those lines, but that seems like a very bit hammer for these kinds of changes.

I’ll puzzle some more.

— Marshall

My thought here is that if someone wants a subset of the ABI-breaking changes (but not all of them),
they can define the _LIBCPP_ABI_* flags individually in their build system.

Suppose we have: in <__config>

#ifdef _LIBCPP_ABI_UNSTABLE
#define _LIBCPP_ABI_FEATURE1
#define _LIBCPP_ABI_FEATURE2
#define _LIBCPP_ABI_FEATURE3
#define _LIBCPP_ABI_FEATURE4
#endif

If someone wanted just FEATURE2 and FEATURE3, they could do:
  cmake <blah, blah> -D_LIBCPP_ABI_FEATURE2 -D_LIBCPP_ABI_FEATURE3

— Marshall

That makes a lot of sense. Leave the complexity to those that need it
and cover the 0% and 100% cases.

cheers,
--renato

This of course assumes that all the ABI-breaking changes are independent.

— MArshall

I don't really like allowing users to pick and choose which ABI
changes to adopt. I personally don't see the use case for picking a
subset of ABI changes to adopt. If you can adopt one ABI change,
then why can't you adopt them all? Furthermore, as time goes on and
more ABI changes are commited this could lead to quite a mess.

I also don't like having a stable version and an unstable version of
the ABI. I'm afraid that by only introducing ABI changes into the
"unstable ABI configuration" they will never be adopted.

Instead I think we should version the ABI and introduce a series of
stable ABI's under different versions. This allows us some sort of
forward momentum and fewer configurations that need maintaining and
testing.
When a consumer can tolerate introducing an ABI break they can move to
the most recent ABI version. After a given amount of time ABI versions
would be deprecated.
The only ABI versions that would be maintained indefinitely would be
the current one.

/Eric

This idea of ABI "version" paradigms well with real world packages. At some
point we'll in theory have many libc++ packaged and released. If at the
same time of the ABI breakage the libc++.so.${N} is incremented we have
some tangible way to relay that information to potential end users.

I'm personally against the #if hell which could result in multiple
universes (I'd rather put a "stable" piece of code in a maintenance branch,
but that's must my not-so-humble opinion.) It's like saying that the IR
should be stable and never change... This is more user facing and gets more
hand waiving, but at the end of the day it's the same sort of problem.

Every time a set of ABI breaking changes happen.. bump the lib version, cut
a branch before the changes and move on with life... what's so wrong about
this? Changes can be ported forward/backward to the branch as the owners
want. Keep code complexity down..

Further - how do you QA libc++ when there's basically "2" correct ways to
build it? Have the buildbots build it twice and report both results?

For anyone who wants or demands stability.. I'd love to hear why a branch
won't work..

I don’t really like allowing users to pick and choose which ABI
changes to adopt. I personally don’t see the use case for picking a
subset of ABI changes to adopt. If you can adopt one ABI change,
then why can’t you adopt them all? Furthermore, as time goes on and
more ABI changes are commited this could lead to quite a mess.

It’s not the users that will be picking and choosing.
I’m thinking of the people that ship libc++ as part of their systems.
[ Mac OS, FreeBSD, Android, iOS, etc ]

Each of them have different requirements for ABI stability.

Then there are the people/organizations who build and use their own libc++ libraries internally. They may or may not have any requirements for ABI stability, but they’re probably more flexible than the people who ship OS’es - whose change horizon is measured in years.

I also don’t like having a stable version and an unstable version of
the ABI. I’m afraid that by only introducing ABI changes into the
“unstable ABI configuration” they will never be adopted.

Yes. There will be vendors who never adopt anything that is an ABI change,
and I believe that supporting those people is important for libc++.

Instead I think we should version the ABI and introduce a series of
stable ABI’s under different versions. This allows us some sort of
forward momentum and fewer configurations that need maintaining and
testing.
When a consumer can tolerate introducing an ABI break they can move to
the most recent ABI version. After a given amount of time ABI versions
would be deprecated.
The only ABI versions that would be maintained indefinitely would be
the current one.

This idea of ABI “version” paradigms well with real world packages. At some point we’ll in theory have many libc++ packaged and released. If at the same time of the ABI breakage the libc++.so.${N} is incremented we have some tangible way to relay that information to potential end users.

I may be wrong, but I don’t see a use case (other than testing and/or libc++ development) for having multiple versions of libc++ on a system.
The goal is for it to be “the C++ standard library” - the one that you get from your system/distro vendor.

Changing the ABI of the libc++.dylib means that you have to rebuild every piece of code that links to that dylib. That’s a high bar.

What this proposal is about is providing those people (who distribute libc++) a method for managing the ABI that they ship.

I’m personally against the #if hell which could result in multiple universes (I’d rather put a “stable” piece of code in a maintenance branch, but that’s must my not-so-humble opinion.) It’s like saying that the IR should be stable and never change… This is more user facing and gets more hand waiving, but at the end of the day it’s the same sort of problem.

Every time a set of ABI breaking changes happen… bump the lib version, cut a branch before the changes and move on with life… what’s so wrong about this? Changes can be ported forward/backward to the branch as the owners want. Keep code complexity down…

At the cost of pushing the complexity onto the code owners, making sure that every non-ABI breaking change goes to all branches.

Further - how do you QA libc++ when there’s basically “2” correct ways to build it? Have the buildbots build it twice and report both results?

That’s what I would expect. Build once w/o any ABI breaks (because that’s the most common use case), and then once with them all.

If someone wants a library with a subset of the changes, they can build and test that.
For example, Apple would build with _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT but no other changes for AArch64, because that’s what they ship today.

For anyone who wants or demands stability… I’d love to hear why a branch won’t work…

It’s not “a branch”, it’s N branches (2^N if you consider combinations).

— Marshall

First, sorry for delays. I do owe you feedback here though, and then I’ll go look at the patch. =]

To make this concrete, in FreeBSD we ship libc++ as part of the base system. This means that we *must* be able to preserve the ABI for five years. We are able to ship a new, ABI-breaking, version with each new major release (i.e. every 2 years), but we must then support that ABI for the next five. We are likely to end support (which really means security updates) for the 10.x series (which was the first to contain libc++) in early 2019.

Our lives would probably be made easier if we could build a single libc++, with different namespaces for the different ABI versions. We'd then be able to bump the .so version when we switched the headers to a new ABI, but old binaries would still work (you just wouldn't be able to mix them with new binaries in the same process if they passed standard library types across library boundaries - this would be a linker error though, and easy to diagnose). Building multiple .so versions would not be significant overhead though and we can just put the older one in a compat package.

For static linking, being able to build with -DLATEST_AND_GREATEST (or whatever) and not have any of the legacy compat code compiled in would probably be nice.

David

Only change is that I would add a warning for the old name.
Platforms should migrate to the new form, the warning tells them to.

Joerg

My apologies as well for the delay. Chandler’s response was quite excellent, so I’m going to piggy back on his comments.

First, sorry for delays. I do owe you feedback here though, and then I’ll go look at the patch. =]

I’m supportive of this initiative; my main concern is the social aspect that we have a manageable way to discern different ABI configurations and that they are all testable.

This seems reasonable to me, and provides an option for vendors to curate the ABI based on their particular needs.

I like this goal as well. I’m concerned about the practicalities of being able to manage such divergences in the libc++ project in a coherent way, but I think it really comes down to the scale of the changes we are talking about as well as the time scale.

This makes sense to me.

I very much like this goal as well.

It seems like there is a general agreement on the general direction
this should go. In the hope of pushing this forward, I've uploaded a
very basic implementation to http://reviews.llvm.org/D11740. It does
not include any build system changes, because it's not clear if we
want, for example, to bump soname each time ABI version is increased.