I’ll leave any technical discussion on the actual issue to @ldionne and the others on the libc++ team.
The thing I need to figure out is how to handle this in the release context. Our policy is currently to avoid ABI breaks after final. And only issue a 17.1.0 if there is a big bad bug. Which this is. @tstellar pointed out that we never had a situation where the runtime library has had a ABI break and was unsure if that would necessitate a 17.1.0.
So I am searching guidance on how to handle this issue from the community. We basically have the following options as I see it;
Accept the patch and do nothing. Just release 17.0.4 as normal and maybe update the policy to say that our current ABI policy doesn’t affect the runtime libraries.
Deny the patch and create the Abi break in 18 instead. (Put the problem on the libc++ team instead)
release 17.0.4 without the patch and a 17.1.0 with the patch and continue doing two releases for the rest of 17.x
accept the patch and only release 17.1.0 and no 17.0.x releases. If you want the fixes you also need to accept the break.
What do people think and how should we reflect this in our policy going forward?
I wanted to give just a bit more context here. So normally, libc++ never intentionally breaks the ABI. We strive to maintain a 100% stable ABI and we’ve managed to do that quite well in the past, modulo some minor breaks that are basically impact-less. (We do have some ABI-impacting configuration switches for vendors, but those don’t affect the LLVM releases in any way).
This case is different. We implemented std::expected a while ago and we shipped it under -fexperimental-library in LLVM 16. For LLVM 17, we felt comfortable with making it non-experimental since there had been no major changes or issues found in it in the ~6 months it was “staging”. It was recently brought to our attention that std::expected had a major bug in it related to overwriting data contained in the tail padding of another object via [[no_unique_address]]. It’s a really complicated issue, feel free to go read the PRs for details.
What’s important here is that this bug is serious enough that we must fix it. std::expected doesn’t work in important cases otherwise and the symptom is absolutely not user-friendly (i.e. it’s not a clear crash). We have a few solutions to “fix” it, but at the end of the day we will have to break the ABI, there is no way around that. In terms of the libc++ approach here, we have the following options:
Make std::expected experimental again to buy more time. This will be a source break for anyone who started using it since we released LLVM 17.x. Then we can take our time to fix the bug in LLVM 18.
Fix the bug with a simple and not very risky fix but settle for a less efficient implementation of std::expected forever after.
Fix the bug with a more complicated fix (and hence a bit more risk), but get the performance for std::expected that we really want.
I am actually creating a bit of a false dilemma here – we could also in theory keep things as-is and only fix this in LLVM 18 (point 2 by Tobias above). However, I think that this is quite undesirable because it will make things harder for everyone, users and vendors first. This kind of bug is a lot easier to fix when people haven’t actually started relying too much on the type in their code and shipped e.g. APIs that include that type yet require ABI stability. I actually suspect that the number of actual issues we’ll cause if we make the fix and break the ABI soon in LLVM 17 is very small.
Apart from the fact that I would really like us not to ship std::expected in its current state in the final release of LLVM 17, I don’t have a preference for how this is handled release-wise.
When we have an ABI break in one of the development libraries (e.g. libLLVM.so, libclang.so), we bump the minor version and include it in the library SONAME. If we aren’t planning to change the SONAME of libcxx, then I don’t think a minor version bump is necessary.
It is worth noting that this isn’t an ABI break that affects libc++.dylib. It affects code using std::expected via the libc++ headers, which is where std::expected is defined.
Great to hear, that is the case for us too. I suspect many vendors would simply skip over the ABI break and never notice anything if we make the change quickly enough. It’s not very scientific though, it’s just my gut feeling.
I’m pretty sure that at this point there are some Gentoo users using libc++ but I don’t know whether the use of std::expected has hit them. This setup is considered experimental, so technically they should be ready for some breakage. Unfortunately, this also means that they tend to use new versions as soon as they’re released, so the majority of them is probably using 17.x already.
I don’t think minor version bump will be meaningful at all since it will effectively only affect toolchain libraries.
It sounds like maybe the best way forward for 17.x is to move the feature behind the experimental flag and just ignore it until 18.x? This way we don’t have to rush the feature and make sure that we get the ABI right. But if the team is confident that the ABI is 100% correct before we ship the last 17 release I am open to accepting the patch.
Alright folks, I have an important update. My recollection of events as outlined above was not correct: we never made std::expected experimental, we just enabled it by default immediately. I must have confused it with another feature we made non-experimental for the LLVM 17 release. This means that std::expected has been shipping (with the bug) since LLVM 16.
Will there be a way to compile for both ABIs using a compiler switch (like a macro definition), as GCC does for the dual std::string ABI in libstdc++? Otherwise for downstream packagers that would like to support both ABIs for compatibility this is going to be a nightmare.
Can we provide guidance to downstream OSes with default libc++ (FreeBSD, Apple, some more niche Linux distros, etc.) to disable std::expected in their libc++ copies before this becomes part of their stable releases?
Also, if we’re breaking the ABI, can we use ABI tags to make sure that accidental linking to the old ABI fails loudly at static or dynamic link time?
If more ABI breaks are required in the LLVM18 time frame, I think this will require careful planning. The libstdc++ string ABI change was painful, but ultimately managed ok downstream with ABI detection and tooling.
Exactly. This isn’t a “performance” or even a “slight conformance” difference. std::expected is broken in serious ways without the fix.
Yeah, let’s hope. To be fair though, this is a header-only type so this will become part of a library’s ABI if it uses the type at one of its own ABI boundaries. This might not be true on all systems, but from what I’ve seen system libraries that use C++ at ABI boundaries are not the most common thing, a lot of them use C.
The only formal communication channel we have with vendors is the @llvm/libcxx-vendors group on GitHub. I can open a GH issue describing this so I have a way of pinging them. Note that this has been shipping since LLVM 16, so some of the harm has already been done though.
Yes, I think that should be possible. TBH we’ll need to consider the situation as a whole very carefully and come up with a plan where we will ping the vendors. This is the first time I am aware of where we need to break the ABI in a non-trivial way. It hasn’t happened in the last 5 years at least. If you want to be notified, request to join the libcxx-vendors group on GitHub, that is our standard channel.
Agreed. I think the ABI breaks we will need in <ranges> will go mostly unnoticed. We’ll still make a plan and everything, but I expect that it will be mostly consequence-free since <ranges> are typically not used at ABI boundaries. The std::expected ABI break is more concerning.
FWIW I think it would be amazing to have a single channel for these things. Right now we have Discourse with the potentially-breaking tag and also the few vendors groups on GitHub. I don’t mind which one we make the canonical one, but it’d be great to have a single way of reaching out to vendors – I’d argue it’s even important in order to avoid unfortunate oversights.