RFC: adding builtin for deduplicating type lists

TLDR;

I propose the new __type_list_dedup builtin that removes duplicates from a list of types passed to it in an efficient manner. It saves a significant fraction of compile time in certain cases, which is the main motivation for it.
The draft pull request: [Clang] Add __type_list_dedup builtin to deduplicate types in templat… by ilya-biryukov · Pull Request #106730 · llvm/llvm-project · GitHub

Motivating example

We have been looking through the -ftime-trace profiles of our code lately and have found a few places that use template metaprogramming to manipulate type lists and one particular operation stood up: removing duplicates from a list of types.

The code is something like this (see Compiler Explorer for a full example):

...
template <class TL> struct Dedup;
template <> struct Dedup<TypeList<>> {
    using result = TypeList<>;
};
template <class T> struct Dedup<TypeList<T>> {
    using result = TypeList<T>;
};
template <class T, class ...Ts> struct Dedup<TypeList<T, Ts...>> {
    using tail =  Dedup<TypeList<Ts...>>::result;
    using result = If<Contains<tail, T>::result, 
                          tail, 
                          typename Prepend<tail, T>::result>::result;
};

This particular implementation requires O(N^2) template instantiations to deduplicate a list of N types, so the compile-time performance is abysmall for longer lists. A more sophisticated implementation is possible that relies on constexpr and std::sort, which is O(N * log N), but the constant factor is so high that on one of the particularly demanding files, we still end up spending 25% out of 3 minutes it takes to compile this file this list deduplication.

Proposal

In order to improve compile time in those situations and simplify writing this code, I propose to add a builtin template __type_list_dedup that deduplicates the types in the list in the compiler by using a hash map:

static_assert(__is_same(
  __type_list_dedup<TypeList, int, int, double, int>,
  TypeList<int, double>))

The concept of builtin templates is not new in Clang, the implementation is similar to the already existing __make_integer_seq and __type_pack_element .

I’ve implemented this builtin and measured that performance impact of list deduplication goes away almost entirely in the aforementioned example and reduce the compile time by 25% out of those 3 minutes.

The implementation creates a builtin template declaration with the following signature:

template <template<class...> class Template, class ...Types>
using __type_dedup_list = /*<builtin>*/;

and can be used with any templates or template aliases:

__type_dedup_list<std::tuple, int, int, double> x;
// same as std::tuple<int, double>

template <class ...Types> struct TypeList;
__type_dedup_list<TypeList, int, int, double> y;
// same as TypeList<int, double>

template <class ...Types>
using PairedWithInt = TypeList<std::pair<int, Types>...>;
__type_dedup_list<PairedWithInt, int, int, double> t;
// same as TypeList<pair<int, int>, pair<int, double>>

If any of the template arguments are dependent, the instantiation is postponed, similarly to __make_integer_seq and __type_pack_element.

Alternative considered: type trait

Another possibility would be to introduce a type trait that reinstiates the class passed to it:

template <class ...Types>
struct TypeList;
__dedup(TypeList<int, int, double>) x; //same as TypeList<int, double>.

The cons is that the implementation would be more complicated and require custom type checking:

__dedup(int) // not allowed, needs to be diagnosed.

and the semantics of type aliases is either unclear or even confusing:

template <class ...Types> using Twice = TypeList<Types..., Types...>;
__dedup(Twice<int, double, int>); 
// easy to expect
//   Twice<int, double> == TypeList<int, double, int double>;
// but type aliases must be expanded right away,  so should be equivalent to
//   __dedup(TypeList<int, double, int, int, double, int>)
//   == TypeList<int, double>.

The pro was that it would be a little more ergonomic in some cases.

I have finalized the implementation (apart from a few small things) in [Clang] Add __type_list_dedup builtin to deduplicate types in template arguments by ilya-biryukov · Pull Request #106730 · llvm/llvm-project · GitHub and sent it for review.

Thanks for the RFC.
Could you speak a bit more about the motivation for this feature?

It’s not the first time that I have been involved in discussions around similar features, and there is usually a desire for something that dedups but also sorts, such that the result is stable and can appear in mangled names, for example

I’m also curious whether such builtin would be immediately useful to, for example, libc++ such that it is beneficial to the whole community.

As for the interface, what you are proposing is fine given the current state of things.
The lack of sorting/stability would make this extension unsuitable for a P2300 implementation (which means eventually libc++ is going to ask for something slightly different to cover their needs for that)

However, features like reflection/type ordering could impact the design, and so could, packs of aliases - although there is no telling whether that would be standardardized. I have also heard that there is a desire to get more pack features in clang as compiler extensions, and should that pan out it might also affect your design.

So things are a bit in flux.

But maybe there is enough of a need for __type_list_dedup now that it justifies having it Clang in its current form, which is why I’m trying to understand the motivation better (the change are reasonably contained so ripping them out if we find a better direction would hopefully not be too much of an issue either).

(In the same train of thought, we should deprecate and remove __type_pack_element as there is now a language feature doing the same thing)

I definitely see the motivation for this, and I’m sympathetic for anything that causes O^2 instantiations for anything.

I am ALSO very sympathetic to sorting being at least an option here. I can see a use case for either. A version more similar to your type_trait version would be composible, which would perhaps be more useful?

I VERY much would like to see what the libc++ folks think about this, and if they want to modify the interface in any way. If a sorted list is necessary, I’d very much also like to see whether a separate sorted builtin would be acceptable.

As far as name, I think this should be __builtin_typelist_dedup or something like that. We might ALSO consider using unique instead of dedup to better reflect what the STL does.

For the interface, I don’t like the type-trait version as it sits, having to figure out the instantiation/diagnostics associated with that would be messy. I have a concept-of-an-idea of having this builtin produce a pack, such that you’d have a composible version that would be:

TypeList<__builtin_typelist_sort(__builtin_typelist_unique(SomePackOfTypes...)...)...>, but I haven’t thought a ton about implementability yet.

(In the same train of thought, we should deprecate and remove __type_list_dedup as there is now a language feature doing the same thing)

Not sure I get this? Did you mean a different builtin?

I meant __type_pack_element - I updated my previous message

To be clear, this is where I hope we land on. But that brings the issue of packs outside of templates (which we could solve by mandating that the argument to __builtin_typelist_unique is a pack expansion - which would probably be very reasonable for the time being).

It’s very possible that this is the only design issue we would have. Otherwise it does seems like some fancy pack slicing, and last time I checked, there wasn’t any major concern with the implementation of such feature, so it’s very possible that we could go in that direction today. And, if at all possible, I agree that we should. There is a much better chance that this design would hold the test of time and integrate cleanly with future features.
I should note that, however, the implementation of that direction will be more involved.

The only motivation is compile-time performance. There is nothing that can’t be done without this feature in the current C++, this builtin can be implented, but there is no efficient way to do it.

That’s not something that this particular extension was trying to address, but I would be happy to work on the necessary extensions too.
Could you provide some concrete pointers in the proposal that I should be looking at and what libc++ might need?
I am not familiar with P2300, so I am not sure which parts of it this change applies to.

Yeah, having a separate builtin for this was my first thought as well. I can see the motivation for the builtin that sorts the types, but it is also likely to be more costly and only needed when people do care about ABI stability (STL is a good example, although I don’t fully understand P2300 yet; but there are a lot of cases where people are not strictly dependent on the ABI stability and the order does not matter as much).

I am a little torn on the costs of implementing something that does not rely on
builtin templates. The current implementation also allows composing the operations, but needs some unwrapping machinery:

// Or use wrappers:
template <class>
struct UnpackAnd;

template <class...T>
struct UnpackAnd<TypeList<T...>> {
  using sort = __sort<TypeList, T...>;
  using dedup = __dedup<TypeList, T...>;
  using both = UnpackAnd<__dedup<TypeList, T...>>::sort;
};

Although it is more clunky than the suggested approach with experssion, it achieves the same outcomes with reasonable overhead. The proposed expression is only useful inside a single expression (or, rather, a single template argument) anyway as there is no way to “store” a list of types in a typedef or any other entity without a template.

The name is now __builtin_type_pack_dedup in the PR (to follow the __type_pack_index convention and also adding the __builtin prefix).
The reason I avoided unique is because std::unique removes consecutive equal elements, while this builtin removes all duplicates. So I picked a different name to make sure people don’t pattern-match it against std::unique that does a slightly different thing (although similar).

Yeah, I can definitely see the appeal in pursuing the other direction as it makes composition of multiple operations easier (see the UnpackAnd example above).

I also agree that the implementation will be more involved and I would rather stick with the mechanisms we have in Clang and it would be nice to get the


I wanted to summarize the points raised. Proposals are from me personally and are not agreed
  • There are two options for syntax:
    1. __dedup<TypeList, Ts...> via builtin template (currently implemented). It is easy to implement and allows equivalent
    2. __dedup(Ts)... . It composes better with other similar operations and therefore seems to be superior in terms of syntax; however, it is harder to implement (and I also expect things like tooling/Clangd to need an update in order to support it).
    • Proposal: land the builtin template, explore custom expression separately. The builtin templates are a proper mechanism to implement this in Clang and even if we have a different syntax for this eventually, sharing the code and implementation does not incur a maintenance burden to justify blocking this on landing an alternative mechanism not present in the compiler today (e.g. it would probably at least be useful to at least chat with the Standards committee after implementing a prototype because it seems like a potential future language extension, I believe this should take a while). Don’t let perfect be the enemy of good.
  • Lack of ABI stability with the new extension.
    • Proposal: design and prototype a separate sort builtin that uses stable identifiers. Not all places require ABI stability, but it’s very important in some cases (e.g. STL). Having two builtins seem to strike the right balance here by giving full control of performance trade-offs to users. Note that I expect only experts to use compiler builtins in the first place. The ABI stability should be documented (and the pitfalls should be documented in the initial dedup patch too).
  • Naming of the builtin.
    • Proposal: use __builtin_type_pack_dedup. I believe it should satisfy all constraints that people have raised.

The big open question that I still have is whether having __sort in addition to __dedup is enough to make libc++ implement P2300. @cor3ntin could you speak to that?

Could everyone in this thread comment on the points I made above? Are my proposals reasonable or do we want to do something differently?

I’m asking for use cases for a deduplicating-but-not-sorting trait. I fully agree that if such trait is widely used, there would be value in having a built-in.

I am not very concern about the performance cost of sorting here. but there would of course be an implementation cost, ordering types is somewhat novel. relying on mangling seems like the right approach.
If we had a better way to compose these things as per Erich and I suggestion, i think it would make sense to have two separate facility.

My preference would be

  • Go with option 2 __dedup(Ts)...
  • Pursue sort separately
  • Remove the existing __type_pack_index after some period of time (so that we don’t have multiple ways to do that in the language

I hope that __dedup(Ts)... could be implemented in such a way that it would be easier to add sort, and other such builtins

The big open question that I still have is whether having __sort in addition to __dedup is enough to make libc++ implement P2300. @cor3ntin could you speak to that?

Yes, that would be sufficient

I still kinda hate ‘dedup’ as a name, simply because of how opaque it seems.

I’m less concerned with compose-ability now that I’ve seen the UnpackAnd example, but usability is still a concern.

I’d like to see at least an exploration of the dedup(Ts)... implementation before we decide it is too hard, as I think the usability improvements of it are at least worth a bit of effort. Note that we can basically never remove these builtins, or we make the life of librarys that much harder, so getting it right the first time is valuable.

Side NOTE: I see we call this type_pack, but IS IT? And SHOULD IT? I find myself wondering if this is something that would be valuable/usable for NTTPs. It would be extra neat if it worked for BOTH at the same time.

Before accepting anything, I still want to hear from some of the libc++ folks (@ldionne and friends) on what sort of things they would like out of such a feature.

1 Like

We could support both but if we go with the dedup(Ts)... approach we would need both a Type and an Expr AST node, which I am not sure is worth it as a first iteration (I expect dedup(Ts)... to mirror the implementation strategy and cost of pack indexing). In the long term, long term we probably want it to support it for completeness (I think we should strive to have our compiler extensions well-rounded when possible). same for template-name, presumably

Ah, right, I was more talking about the first version of this, as proposed. It is taking a template + a list of arguments to de-dup, it seems to me that THAT version should work for more than just types (AND, would in fact be motivation for the version that produces a pack to not work, as it would then allow mixed type/nttps).

Okay, given the feedback so far, let me try the following next steps:

  • prototyping the __dedup(<Type>)... and see how complicated it ends up being.
  • prototype a __sort builtin based on name mangling.

I will report back when I have the results from those.

Yeah, this makes sense. We know from @cor3ntin that this should be enough to implement P2300. I guess an even broader question for @ldionne and libc++ folks is whether there are other extensions for working with variadic templates that would be useful for libc++ (e.g. for simplifying the code or improving the compile-time performance). While we are at it, we might as well implement more of these.

Could you clarify why ABI stability is considered a strict requirement? E.g. our internal codebase have quite a few template metaprogramming patterns manipulating long type lists and does not require ABI stability. We would be okay with deduplication without sorting as it stands in the current implementation.
The type trait that we have produced deterministic results, and as long as input types change, we will potentially get types in different order. But that’s already something that happens with the existing template metaprogramming facilities.

I can totally see how ABI is a strict requirement for libc++, but I’m not sure it generalizes to all C++ code being written.

Naming is hard :slight_smile: I don’t have strong feelings about the name and happy to explore other alternatives. I actually agree unique would’ve been a better name, but as I mentioned earlier, the divergence between what std::unique is doing and this builtin is doing made me stay away from it.

Are there other ideas on your mind?

Or at least make sure we have a good list so we can design them in a way that they are all usable together (when others are made). That is, not signing you up to implement all of them, but at least to make sure the design is cohesive.

Agreed on unique. No other ideas in my mind, just sort of lamenting and hoping someone comes by with the magic-extra-special-better-than-everything-else name :slight_smile:

No pretensions that it’s extra special, but distinct might fit the bill.

Pros:

  • Definition(s) should be suitable, e.g.: “distinguished as not being the same”
  • Does not imply sorting in common parlance
  • There’s no std::distinct
  • Outside of comments, it’s only used in 11 files in clang[1]

Cons(-ish):

  • it’s not a verb (but then, neither would be unique…); however, the proper form distinguish is unsuitable IMO.
  • it’s 3 characters longer than dedup

Overall, __type_list_distinct wouldn’t look offensive to my eyes… :upside_down_face:


  1. Github Search gets a bit confused with the regex (or a stale index) when displaying the search results; you won’t necessarily see the word “distinct” in the preview as it can be a few lines off, but it’s there if you go into the file. ↩︎

I fail to see at the moment a good justification for implementing multiple variants, with sorting and without sorting.

Per rationale, the OP motivating case should still be served by the sorted variant. I’d argue it’s even better, as we would avoid the possibility of forming multiple overloads that have the same meaning.

You can perhaps argue that the original motivating case does not have a strong need for it, but others still do, and in terms of performance, no matter if we sort or not, this should still be orders of magnitude faster than what can be accomplished in the language alone.

So I’d propose having just the sorted builtin.

I think the motivation is "sorting is hard (or at least novel) so if we can agree on a composable design we can separate the concerns/split the workload

Maybe I misspoke, but I definitely did not mean to say that the current use-case is definitely covered by sorting, it will require more investigation.

On top of that, I can imagine the use cases when keeping the original order of types matters and I don’t see why we should discontinue those.

In particular, it’s actually impossible to implement the example in the first post (see motivating example) with a builtin that sorts. Sometimes the final order does not matter (and it’s even preferable to have a stable order, e.g. in P2300), but sometimes it does matter and we need a separate builtin for removing duplicates.

Having two easily composable operations seems like a strictly better trade-off, also for the reasons mentioned by @cor3ntin:

I can definitely see cases where a dedup and not a sort is wanted. So yeah, I see the usage. Though the more I think about it, I think if we stick to the #1 version (the not-creates-a-pack), it needs a prefix/postfix list of types/NTTPs. Consider anything with an allocator, which we don’t want uniqued.

So I think the sort being a separate, composable task is better in nearly all ways (the exception being, as currently designed, requiring 1 additional instantiation to make it happen).

That said, I’ll disagree with Corentin: Sorting types in a stable way is NOT hard. We already canonicalize them, so finding some level of sorting said canonicalization is pretty easy. Alphabetically, etc. What is HARD is finding some sort that can be done reliably in all compilers, which is why I encouraged the paper to use mangling (and sadly, EWG encouraged otherwise).

CC @philnik for additional thoughts

Personally, I believe that a builtin like that can definitely be useful. It is also more useful if we have the sort and unique capabilities as composable building blocks instead of a single sort+unique builtin.

A tricky aspect of this is determining what to sort types on. While the mangling is definitely a property that we can sort on, there are other possibilities. We often want to sort based on properties like sizeof or alignof, but it’s not clear how a builtin would allow that. Another question is whether the sorting should be done in a way that’s consistent with e.g. std::typeid, which may or may not be implementable.

Given these difficulties, if the original problem was really just to “deduplicate types”, it may make sense to implement a builtin that does just that and punt on solving harder questions about type sorting.

Also, my vote on the syntax would be to stay as un-fancy as possible and use __builtin_type_list_unique<Template, int, int, long, char> (with whatever name we pick). I’d be a bit reluctant to introduce novel syntax just for a builtin. IMO there’s little value in making the builtin “cute” at the point of use, since it will be wrapped in a metaprogramming library 95% of the time.

Just my .02.

I think having a separate type sort builtin significantly enlarges the scope of the solution proposed, while having no clear motivation.

For example, should the sort builtin accept a user-defined predicate? Would it make sense to allow any kind of template arguments then, not just types?

Meanwhile, what P2300 needs is straightforward.

What we need is a way to canonicalize a set of types. Uniquing + any stable sorting is just a means to accomplish that, and splitting these concerns up can cause problems if the intermediate results shows up in the interface. It will cause us to create a bunch of extra intermediate types, which at best just hurts performance.