Reducing overhead of compressed_pair

@philnik @rnk @cjdb
Spinning off from ⚙ D153991 [libcxx] replaces `__compressed_pair` with `[[no_unique_address]]` in `unique_ptr`, quoting @philnik:

A lot of the reduced symbols is probably the avoidance of __compressed_pair right now.

Would there be some kind of compiler-provided construct (assuming all non-compiler-provided options to reduce the overhead have already been exhausted) that could reduce the overhead of __compressed_pair across the board/all/many of the uses in libc++? Is it just no_unique_address? Or other features that’d be necessary to express to avoid the extra templates/etc that are currently used, including __compressed_pair?
(if it’s non-portability of no_unique_address, could it be used in the unstable ABI?)

The extra template instantiations, in aggregate, seem like the add-up to quite a bit (though I don’t have specific numbers) and I certainly come across them a lot when looking at DWARF in non-trivial programs.

I don’t know whether there is a better way to write __compressed_pair, but [[no_unique_address]] would definitely be the way to go here. We don’t have it in the unstable ABI because that would result in duplicated code in a lot of places, causing maintainability concerns. Another problem is that this would massively regress libc++ on windows in the unstable ABI, since our symbols would become much larger (e.g. 16 bytes for a unique_ptr).
Probably the simplest thing to do here is implementing [[msvc::no_unique_address]] and replace __compressed_pair with that. I wouldn’t be surprised if that were actually easier for someone familiar with the Microsoft ABI than trying to implement [[no_unique_address]] versions of everything in libc++, and we really want to implement it anyways for feature parity between clang-cl and cl.

I think replacing __compressed_pair<T, U> with

_LIBCPP_NO_UNIQUE_ADDRESS T t;
_LIBCPP_NO_UNIQUE_ADDRESS U u;

and

#if __is_clang_cl__ // not sure what the macro is
#  define _LIBCPP_NO_UNIQUE_ADDRESS [[msvc::no_unique_address]]
#else
#  define _LIBCPP_NO_UNIQUE_ADDRESS [[no_unique_address]]
#endif

should work, no? __compressed_pair aims to achieve the same thing as the above, so it should be possible to do the above.

What @philnik said is exactly what I was going to say.

Except for the bit about Windows. Adding 8 bytes to some types doesn’t seam “massive” to me, nor do I think it strikes the right balance for our users.

The amount of users on Windows , in my estimation, is tiny. We don’t have any actual numbers, but I would guess it’s much smaller than even 1%. So let me go on record and say:

I don’t think Windows should hold us back.

Also, Any solution here needs to get rid of compressed pair. Having it coexist is a maintenance burden we don’t want (and we’ve given it heavy consideration before).

1 Like

Modulo some ABI shenanigans, like final types (which should be solvable), yes. The only problem is that clang-cl doesn’t implement [[msvc::no_unique_address]] currently.

Seems like a lot to me when you just want a pointer.

We don’t have a lot of users on windows, but implementing [[msvc::no_unique_address]] can’t be that hard. Adding it seems to me like the way simpler approach than trying to fight the people who do use libc++ on windows. I don’t think the Chromium people would be very happy to trade regressed object size against a bit of debug improvement, especially if it’s not that hard to improve both.

Modulo some ABI shenanigans, like final types (which should be solvable), yes. The only problem is that clang-cl doesn’t implement [[msvc::no_unique_address]] currently.

Excellent, I’ll update the patch since that means we can have mostly everything in the primary template plus a couple of explicit reset specialisations.

Not sure what shenanigans you’re referring to, would you mind elaborating please?

We don’t have a lot of users on windows, but implementing [[msvc::no_unique_address]] can’t be that hard.

I can prioritise implementing this, but testing is likely going to stall this a bit (I always run into Windows complications).

Yeah, if implementing [[msvc::no_unique_address]] is the only compiler feature needed to remove __compressed_pair entirely - that sounds super awesome/worth giving a go & sounds like it’d reduce implementation/source complexity as well as compiler work (by reducing the number of template instantiations), ir gen (fewer functions), and dwarf (fewer types and functions)… sounds pretty great to me!
(please correct me if I’ve misunderstood anything here)

(I guess a side question: why do we need [[msvc::no_unique_address]] as distinct from [[no_unique_address]]… ah, think I’ve found it - the msvc variant has a slightly different behavior, so it’s a different ABI (it wouldn’t be sufficient to use [[no_unique_address]] for clang-cl and [[msvc::no_unique_address]] for real cl.exe) - fair enough then :slight_smile: )

[[no_unique_address]] is better at compressing some objects than __compressed_pair is. e.g. Compiler Explorer. Because of that we’ll have to add padding in some cases to the class.

That’s my understanding too.

On MSVC [[no_unique_address]] is unfortunately a no-op, so we have to use the msvc spelling to get anywhere. It sucks, but we can’t do anything about it.

Ah, yeah - eventually found MSVC C++20 and the /std:c++20 Switch - C++ Team Blog which discusses those issues. Thanks for the details!

There is a llvm::PointerUnion. Could there be:
__compressed_pair<Aligned16*, Aligned32*> smaller than __compressed_pair<EightByte, EightByte>?

I very much support the direction of using [[no_unique_address]], mostly for the benefits in terms of compile time.
The only thing we should be careful about is that [[no_unique_address]] does nothing on objects of the same type, so it’s slightly different from __compressed_pair afaict.

I want to echo @EricWF. It may be nice for clang to support msvc::no_unique_address (afaik the reason we don’t is that its ABI is not documented), but i don’t think we should condition this work on that support.
After all, MSVC knew that their users would not get the benefits of [[no_unique_address]], and decided it was okay, so surely it is!

So

  • using _LIBCPP_NO_UNIQUE_ADDRESS wherever it make sense
  • extending _LIBCPP_NO_UNIQUE_ADDRESS at a later date to support more compilers

Do seem like a great plan!

@zygoloid Had a patch to do that a while back: ⚙ D63744 In the libc++ unstable ABI, use [[no_unique_address]] instead of __compressed_pair when available.

There are three main concerns IMO:

  1. Maintainability. The patch linked above had serious maintainability issues. We can’t introduce that many #ifdefs in the middle of code logic to refer to different names based on the ABI macro. I think some refactoring in the affected classes might be able to resolve that concern.

  2. ABI stability concerns. If we want to apply these changes even in the stable ABI (which should be possible), we’ll need to insert some padding between members sometimes to replicate the layout that used to be produced by __compressed_pair. I had expanded on that in this comment of D63744.

  3. The lack of proper [[no_unique_address]] on Windows, but I think the path to fix that is well understood.

If we tackle those issues, I’m really happy to move forward. I think we all dislike __compressed_pair.

Is there a technical reason for this, or is it just because the standard says so?

The Standard
http://eel.is/c++draft/basic.memobj#intro.object-9

Two objects with overlapping lifetimes that are not bit-fields may have the
same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage

To try to summarize the platform support story so far:

  • All platforms that libc++ supports have long supported [[no_unique_address]], except MSVC and clang-cl.
  • MSVC failed to implement no_unique_address on the first go around (they ignored it), so now they feel they can’t implement it without breaking ABI compat.
  • Clang had to copy that logic into clang-cl to be ABI compatible.
  • MSVC has had [[msvc::no_unique_address]] for long enough that we can use it in libc++
  • Clang doesn’t have an implementation of [[msvc::no_unique_address]], and we need to take some care to ensure that ours is ABI compatible with MSVC.
  • The implementation is covered by https://github.com/llvm/llvm-project/issues/49358 .

So, I take it that the next step is to work on https://github.com/llvm/llvm-project/issues/49358 in clang.

I think we have previously established that Windows users of libc++ don’t prioritize ABI stability because libc++ is not the system standard library there. The major use case for an alternative standard library is to be faster or better than the standard one in some way, so performance and compile time should be prioritzed over ABI stability. So, in conclusion, I don’t think we should worry about waiting too long after fixing the clang issue to start using the new attribute in libc++. Maybe a single release, but not two.