RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

_sean_silva · November 13, 2020, 10:06pm

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Vec is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.

A lot of this is boiled out from the discussion in https://groups.google.com/g/llvm-dev/c/q1OyHZy8KVc/m/1l_AasOLBAAJ?pli=1

The goals here are twofold:

convenience: not having to read/write “N”, or do an extra edit/recompile cycle if you forgot it
avoiding pathological cases: The choice of N is usually semi-arbitrary in our experience, and if one isn’t careful, can result in sizeof(SmallVector) becoming huge, especially in the case of nested SmallVectors. This patch avoids pathological cases in two ways:
A. SVec's heuristic keeps sizeof(SVec) bounded, which prevents pathological size amplifications like in SmallVector<struct {SmallVector<T, 4> a, b; }, 4>, where the small sizes effectively multiply together. Of course, users can always write SmallVector<T, N> explicitly to bypass this, but the need for that seems rare.
B. SmallVector<T, 0> feels “weird to write” for most folks, even though it is frequently the right choice. Vec mitigates that by “looking natural”.

I’m surfacing this as an RFC to get feedback on a couple higher-level points:

does everybody agree that SVec and Vec are useful to have?
get wider consensus around suggesting these as “defaults” (see my updates to ProgrammersManual.rst in the patch)
how much we want to bulk migrate code vs let it organically grow. Replacing SmallVector<T, 0> with Vec should be completely mechanical. Replacing SmallVector<T, N> for general N would be a lot more work.
of course: naming. SVector/Vector were floated in the patch as well and seem ok. SmallVec was rejected as it was a prefix of SmallVector (messes up autocomplete).

Looking forward to a world with fewer guessed SmallVector sizes,

– Sean Silva

dblaikie · November 16, 2020, 8:55pm

I will say I'm not a huge fan of adding even more names for things in
this fairly core/common use case (now we'll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector<T> instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there's a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you're nested
inside another data structure and/or heap allocated (like tail
allocations).

Got any sense of the total value here? Major savings to be had?

(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.

mehdi_amini · November 16, 2020, 9:55pm

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code). To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code. An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I think someone (JYKnight, perhaps) mentioned in the code review
(always difficult fragmenting the discussion between code review and
RFC, unfortunately - not sure there’s a great solution to that - some
way to lock comments on a Phab review might be nice) that there are
cases where you do want a small inline buffer even when you’re nested
inside another data structure and/or heap allocated (like tail
allocations).

Yes: I think it is misleading to formulate anything about heap, I see a SmallVector inside a heap allocated object is akin to a trailing allocation optimization.

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns, and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Cheers,

dblaikie · November 16, 2020, 10:12pm

I will say I'm not a huge fan of adding even more names for things in
this fairly core/common use case (now we'll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector<T> instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

To me the drawbacks are outweighing the benefits too much.
Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)

Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see LLVM Programmer’s Manual — LLVM 18.0.0git documentation for the documented points on N=0).

I'm not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the "Small" is slightly misleading (though not
entirely - it's important for the "SmallVectorImpl" benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl<T>& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector<T> can do what your proposed SVec<T> does, that leaves
the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
over std::vector<T>? I guess the SmallVectorImpl generic algorithm
opportunities? Though that's rarely needed compared to ArrayRef.)
If SmallVector<T> would suffice then maybe Vec<T> could be
ZeroSmallVector<T>? Not sure.

ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns,

Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
have one fewer name and less convention to base the Vec<T> on (since
it won't have the SVec<T> sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".

Somewhat - but Vec is an abbreviation that isn't really used otherwise
(if we consider SVec as well, I'd still push back on the introduction
of the abbreviation for both cases) and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).

I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)

mehdi_amini · November 16, 2020, 10:43pm

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I’m not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code SmallVector<SomeType> does not express that it is intentional to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly 0, and as a reviewer (or code reader later) I don’t know if this is was intentional or not. This is why I am quite opposed to “loosen” the current requirement on SmallVector: a different name for a different intent is better fitting to me.

To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is “easy to misuse”, I don’t see any advantage to it while it has drawbacks: why would we do that?

An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. “limited inline storage if possible”
could be “and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code” (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that! This is why I don’t like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a N: if I read SVec there can’t be a possible accidental omission of N here. It has to be a choice between SmallVector<SomeType, 4> and SVec<SomeType> but not SmallVector<SomeType>.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I’m not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the “Small” is slightly misleading (though not
entirely - it’s important for the “SmallVectorImpl” benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns,

Somewhat - if SmallVector can be used instead of SVec then we
have one fewer name and less convention to base the Vec on (since
it won’t have the SVec sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Somewhat - but Vec is an abbreviation that isn’t really used otherwise
(if we consider SVec as well, I’d still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we’ll see
“Vector” used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

I wouldn’t entirely object to SmallVector getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don’t feel /super/ great about it, but yeah, if we’re going to push
the Programmer’s Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn’t
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it’s
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

dblaikie · November 17, 2020, 12:10am

>>
>> I will say I'm not a huge fan of adding even more names for things in
>> this fairly core/common use case (now we'll have even more vector
>> names to pick from) - can we use default template arguments so we can
>> write SmallVector<T> instead of SmallVector<T, N> and would that
>> address some of the use cases proposed here?
>
>
> I won't claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I'm quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I'm not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code `SmallVector<SomeType>` does not express that it is *intentional* to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly `0`, and as a reviewer (or code reader later) I don't know if this is was intentional or not. This is why I am quite opposed to "loosen" the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don't know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it's intentional: I think if we're moving in this
direction that's proposed, the idea is that by default we don't want
to be explicit about the N - so in general we won't be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don't wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn't
required/intended unless it's added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

> To me the drawbacks are outweighing the benefits too much.
> Also, `SmallVector<LargeType>` would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is "easy to misuse", I don't see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind? To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there's no need for us to think about
it on every use, we just let the default do what it's meant to do.

> An alternative was to reserve the default to only "small object" so that N isn't zero, but there isn't a platform independent way of doing that and keep the code portable I believe. So `SVec<T>` is really saying: "I am willing to pay for some limite inline storage if possible but I don't have a N in mind".

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. "limited inline storage if possible"
could be "and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code" (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that!

Sorry, it seems we're jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

Two separate discussions/threads/patches may help keep the
communication more clear.

This is why I don't like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a `N`: if I read SVec<Type> there can't be a possible accidental omission of N here. It has to be a choice between `SmallVector<SomeType, 4>` and `SVec<SomeType>` but not `SmallVector<SomeType>`.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it's worth considering/comparing this to other templates and their
default template arguments. Things like std::vector's customizable
allocators (you. don't have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don't wonder every time we see std::vector<T> whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it's
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec<T> the same way we
ask about other data structures today "did you mean SmallVector<T,
3>"?), unique_ptr's custom deleters, etc.

> Finally the simple `llvm::Vector` case to replace `SmallVector<T, 0>` is because it is frequently preferable to `std::vector` but still isn't readable or immediately intuitive and so is rarely used in practice (see LLVM Programmer’s Manual — LLVM 18.0.0git documentation for the documented points on N=0).

I'm not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the "Small" is slightly misleading (though not
entirely - it's important for the "SmallVectorImpl" benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl<T>& might also be confusing).

>> Got any sense of the total value here? Major savings to be had?
>>
>> (if SmallVector<T> can do what your proposed SVec<T> does, that leaves
>> the Vec<T> - could you expound on the benefits of SmallVector<T, 0>
>> over std::vector<T>? I guess the SmallVectorImpl generic algorithm
>> opportunities? Though that's rarely needed compared to ArrayRef.)
>> If SmallVector<T> would suffice then maybe Vec<T> could be
>> ZeroSmallVector<T>? Not sure.
>
>
> ZeroSmallVector<T> does not really address your "more vector names to pick from" concerns,

Somewhat - if SmallVector<T> can be used instead of SVec<T> then we
have one fewer name and less convention to base the Vec<T> on (since
it won't have the SVec<T> sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

> and it is longer than `SmallVector<T, 0>`: shouldn't we aim for the "default case" to be the easiest to reach / most intuitive to pick? `llvm::Vec` looks like "just a vector".

Somewhat - but Vec is an abbreviation that isn't really used otherwise
(if we consider SVec as well, I'd still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we'll see
"Vector" used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have "here's the default vector you shuold use instead of
std::vector for these reasons" situations).

I wouldn't entirely object to SmallVector<T> getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don't feel /super/ great about it, but yeah, if we're going to push
the Programmer's Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn't
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector<T> and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it's
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it's always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

- Dave

Chris_Tetreault · November 17, 2020, 12:38am

I appreciate the work you’ve done on this. My impression is that the thought process that goes into picking the size parameter is something like “Uhh… Uhh… I guess 4 is good?”, so doing work to formalize this is good. I also think that people are likely reluctant to use SmallVector<T, 0>. That said, I don’t think adding new types is useful.

I think SmallVector should just get a default size that is computed as SVec’s default size parameter is. Honestly, I’m unconcerned that anybody would “forget” to populate the size parameter. I really feel like any value that isn’t backed by hard data is unlikely to be better than the result of calculateSVecInlineElements<T>(), and should be questioned in code review given the new heuristic you’ve added.

In short, I think that SmallVector should get a default size, and that the documentation should become something to the effect of “SmallVector should be preferred to std::vector unless you have a good reason to use the stl container. (api interop?). The default size for SmallVector has been chosen to be reasonable for most use cases. If you have evidence that shows the default is not optimal for your use case you can populate it with a more suitable value.”

Again, thanks for working on this. I’m looking forward to the day when I don’t have to pick a number out of my backside to put in that parameter.

Thanks,

Christopher Tetreault

mehdi_amini · November 17, 2020, 12:47am

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I’m not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code SmallVector<SomeType> does not express that it is intentional to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly 0, and as a reviewer (or code reader later) I don’t know if this is was intentional or not. This is why I am quite opposed to “loosen” the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don’t know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it’s intentional: I think if we’re moving in this
direction that’s proposed, the idea is that by default we don’t want
to be explicit about the N - so in general we won’t be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don’t wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn’t
required/intended unless it’s added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

I get your point, but I don’t share the conclusion here: I don’t believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.

To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is “easy to misuse”, I don’t see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind?

It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don’t have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.

To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there’s no need for us to think about
it on every use, we just let the default do what it’s meant to do.

An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. “limited inline storage if possible”
could be “and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code” (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that!

Sorry, it seems we’re jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

You’re answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn’t clear to me how you link this to the 0 case (which does not have inline storage)?

Two separate discussions/threads/patches may help keep the
communication more clear.

This is why I don’t like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a N: if I read SVec there can’t be a possible accidental omission of N here. It has to be a choice between SmallVector<SomeType, 4> and SVec<SomeType> but not SmallVector<SomeType>.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it’s worth considering/comparing this to other templates and their
default template arguments. Things like std::vector’s customizable
allocators (you. don’t have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don’t wonder every time we see std::vector whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it’s
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec the same way we
ask about other data structures today “did you mean SmallVector<T,
3>”?), unique_ptr’s custom deleters, etc.

I don’t necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me.
I’d look instead at SmallDenseMap and DenseMap maybe, or std::map and std::unordered_map which have different class names and aren’t just differentiated with a trait passed as template argument.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I’m not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the “Small” is slightly misleading (though not
entirely - it’s important for the “SmallVectorImpl” benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns,

Somewhat - if SmallVector can be used instead of SVec then we
have one fewer name and less convention to base the Vec on (since
it won’t have the SVec sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Somewhat - but Vec is an abbreviation that isn’t really used otherwise
(if we consider SVec as well, I’d still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we’ll see
“Vector” used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have “here’s the default vector you shuold use instead of
std::vector for these reasons” situations).

I had looked into this when the revision was sent, but I couldn’t find a library with a “vector with inlined storage” and a default N value.
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has a default hard-coded to 1: https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don’t think Abseil has anything else.

Do you have other ideas of related work to look for?

I wouldn’t entirely object to SmallVector getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don’t feel /super/ great about it, but yeah, if we’re going to push
the Programmer’s Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn’t
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it’s
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it’s always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

Best,

_sean_silva · November 17, 2020, 2:14am

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I’m not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code SmallVector<SomeType> does not express that it is intentional to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly 0, and as a reviewer (or code reader later) I don’t know if this is was intentional or not. This is why I am quite opposed to “loosen” the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don’t know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it’s intentional: I think if we’re moving in this
direction that’s proposed, the idea is that by default we don’t want
to be explicit about the N - so in general we won’t be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don’t wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn’t
required/intended unless it’s added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

I get your point, but I don’t share the conclusion here: I don’t believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.

To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is “easy to misuse”, I don’t see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind?

It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don’t have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.

To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there’s no need for us to think about
it on every use, we just let the default do what it’s meant to do.

An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. “limited inline storage if possible”
could be “and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code” (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that!

Sorry, it seems we’re jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

You’re answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn’t clear to me how you link this to the 0 case (which does not have inline storage)?

Two separate discussions/threads/patches may help keep the
communication more clear.

This is why I don’t like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a N: if I read SVec there can’t be a possible accidental omission of N here. It has to be a choice between SmallVector<SomeType, 4> and SVec<SomeType> but not SmallVector<SomeType>.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it’s worth considering/comparing this to other templates and their
default template arguments. Things like std::vector’s customizable
allocators (you. don’t have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don’t wonder every time we see std::vector whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it’s
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec the same way we
ask about other data structures today “did you mean SmallVector<T,
3>”?), unique_ptr’s custom deleters, etc.

I don’t necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me.
I’d look instead at SmallDenseMap and DenseMap maybe, or std::map and std::unordered_map which have different class names and aren’t just differentiated with a trait passed as template argument.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I’m not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the “Small” is slightly misleading (though not
entirely - it’s important for the “SmallVectorImpl” benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns,

Somewhat - if SmallVector can be used instead of SVec then we
have one fewer name and less convention to base the Vec on (since
it won’t have the SVec sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Somewhat - but Vec is an abbreviation that isn’t really used otherwise
(if we consider SVec as well, I’d still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we’ll see
“Vector” used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have “here’s the default vector you shuold use instead of
std::vector for these reasons” situations).

I had looked into this when the revision was sent, but I couldn’t find a library with a “vector with inlined storage” and a default N value.
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has a default hard-coded to 1: https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don’t think Abseil has anything else.

Do you have other ideas of related work to look for?

Actually, the reason I originally pursued this was that I used a SmallDenseMap and happily found that it had a default number of inline elements: https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
I thought “why not SmallVector, since I use that a lot more often”? But based on historical misuse of SmallVector, using a hardcoded number like “4” (like SmallDenseMap does) as the default doesn’t make much sense, so I pursued a bounded-sizeof approach.

I wouldn’t entirely object to SmallVector getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don’t feel /super/ great about it, but yeah, if we’re going to push
the Programmer’s Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn’t
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it’s
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it’s always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

It sounds like there is pretty good consensus here to just let SmallVector<T> “just work” and choose a default N. The one exception seems to be that Mehdi has a strong objection having the default N possibly be 0 (to ensure the bounded-sizeof property). So let’s shift the discussion with a laser focus to specifically that point.

Responding to the arguments I’ve seen presented so far:

I’m not convinced that SmallVector with defaulted N==0 is really error prone. Mehdi, I like your definition of “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind” and I think we can apply it to SmallVector; I don’t think adding SVec is needed for that.
I don’t understand the concern about “forgetting the N”. In the “new world” of SmallVector<T>, any SmallVector<T, N> in new code would be treated as “I profiled this code and this explicit N is the right choice”. That is, adding an explicit N is going to be rare and deliberate (possibly with reviewers insisting on a comment!) – this is not something that one can “forget”.

For point 2., I think we will have some transition period where reviewers will organically start to propagate the new advice (and patch authors enjoying not having to write N), but I think the direction is effectively what I stated.

– Sean Silva

mehdi_amini · November 17, 2020, 2:44am

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I’m not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code SmallVector<SomeType> does not express that it is intentional to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly 0, and as a reviewer (or code reader later) I don’t know if this is was intentional or not. This is why I am quite opposed to “loosen” the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don’t know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it’s intentional: I think if we’re moving in this
direction that’s proposed, the idea is that by default we don’t want
to be explicit about the N - so in general we won’t be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don’t wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn’t
required/intended unless it’s added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

I get your point, but I don’t share the conclusion here: I don’t believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.

To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is “easy to misuse”, I don’t see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind?

It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don’t have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.

To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there’s no need for us to think about
it on every use, we just let the default do what it’s meant to do.

An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. “limited inline storage if possible”
could be “and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code” (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that!

Sorry, it seems we’re jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

You’re answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn’t clear to me how you link this to the 0 case (which does not have inline storage)?

Two separate discussions/threads/patches may help keep the
communication more clear.

This is why I don’t like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a N: if I read SVec there can’t be a possible accidental omission of N here. It has to be a choice between SmallVector<SomeType, 4> and SVec<SomeType> but not SmallVector<SomeType>.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it’s worth considering/comparing this to other templates and their
default template arguments. Things like std::vector’s customizable
allocators (you. don’t have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don’t wonder every time we see std::vector whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it’s
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec the same way we
ask about other data structures today “did you mean SmallVector<T,
3>”?), unique_ptr’s custom deleters, etc.

I don’t necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me.
I’d look instead at SmallDenseMap and DenseMap maybe, or std::map and std::unordered_map which have different class names and aren’t just differentiated with a trait passed as template argument.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I’m not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the “Small” is slightly misleading (though not
entirely - it’s important for the “SmallVectorImpl” benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns,

Somewhat - if SmallVector can be used instead of SVec then we
have one fewer name and less convention to base the Vec on (since
it won’t have the SVec sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Somewhat - but Vec is an abbreviation that isn’t really used otherwise
(if we consider SVec as well, I’d still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we’ll see
“Vector” used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have “here’s the default vector you shuold use instead of
std::vector for these reasons” situations).

I had looked into this when the revision was sent, but I couldn’t find a library with a “vector with inlined storage” and a default N value.
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has a default hard-coded to 1: https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don’t think Abseil has anything else.

Do you have other ideas of related work to look for?

Actually, the reason I originally pursued this was that I used a SmallDenseMap and happily found that it had a default number of inline elements: https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
I thought “why not SmallVector, since I use that a lot more often”? But based on historical misuse of SmallVector, using a hardcoded number like “4” (like SmallDenseMap does) as the default doesn’t make much sense, so I pursued a bounded-sizeof approach.

I wouldn’t entirely object to SmallVector getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don’t feel /super/ great about it, but yeah, if we’re going to push
the Programmer’s Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn’t
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it’s
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it’s always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

It sounds like there is pretty good consensus here to just let SmallVector<T> “just work” and choose a default N. The one exception seems to be that Mehdi has a strong objection having the default N possibly be 0 (to ensure the bounded-sizeof property). So let’s shift the discussion with a laser focus to specifically that point.

Responding to the arguments I’ve seen presented so far:

I’m not convinced that SmallVector with defaulted N==0 is really error prone. Mehdi, I like your definition of “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind” and I think we can apply it to SmallVector; I don’t think adding SVec is needed for that.

I understand your point, I still strongly disagree for the reason previously discussed: in particular I don’t understand why applying it to SmallVector would be better than what we converged on with SVec.

mehdi_amini · November 17, 2020, 2:52am

I will say I’m not a huge fan of adding even more names for things in
this fairly core/common use case (now we’ll have even more vector
names to pick from) - can we use default template arguments so we can
write SmallVector instead of SmallVector<T, N> and would that
address some of the use cases proposed here?

I won’t claim it is perfect, but the added names are a compromise over rounds of reviews with the folks in the revision. In particular I’m quite concerned that a default value for N on the SmallVector does not carry the intent the same way, and is too easy to miss in review (or while reading code).

I’m not quite following here - what sort of problems do you anticipate
by readers missing the default value for N?

Reading code SmallVector<SomeType> does not express that it is intentional to leave of the value for N. It can easily be just forgotten, and it could easily be implicitly 0, and as a reviewer (or code reader later) I don’t know if this is was intentional or not. This is why I am quite opposed to “loosen” the current requirement on SmallVector: a different name for a different intent is better fitting to me.

I don’t know why it being implicitly zero is noteworthy - anymore than
it being implicitly 1 or 3, etc.

As for whether it’s intentional: I think if we’re moving in this
direction that’s proposed, the idea is that by default we don’t want
to be explicit about the N - so in general we won’t be. And sometimes
someone will want to be explicit and add an N, but otherwise the
reasonable default is not to have an N. I think the intent is clear
enough - consider other default template parameters like customized
deleters on a unique_ptr: I don’t wonder if someone intended to have a
customized deleter on every unique_ptr, one assumes that wasn’t
required/intended unless it’s added. It seems like the idea is for
non-explicit N to be a safe/common default, and explicit N to be
noteworthy/require some justification or scrutiny.

I get your point, but I don’t share the conclusion here: I don’t believe that universally not specifying the N on non-trivial types is a good default. It only makes sense to me for small types, hence it is error prone.

To me the drawbacks are outweighing the benefits too much.
Also, SmallVector<LargeType> would end-up with N==0 implicitly, without an easy way to figure it out that there is no actual inline storage while reading the code.

Why would it be necessary to figure that out? If we generally feel the
right default inline storage happens to be zero in that case and that
SmallVector will pick a good default (including possibly 0) that seems
like a good thing to me. (why would we call out zero specially, if we
want to avoid calling out other values explicitly)

I think this is an API that is “easy to misuse”, I don’t see any advantage to it while it has drawbacks: why would we do that?

What kind of misuse do you have in mind?

It does not convey the intent, it is too easy to misuse: i.e. you expect a small storage but you don’t have one. This is enough to me to outweigh the benefits of the direction and prefer the status quo.

To me it seems like it would
be consistent with the idea that we have built rules about what good
default inline sizes are - and there’s no need for us to think about
it on every use, we just let the default do what it’s meant to do.

An alternative was to reserve the default to only “small object” so that N isn’t zero, but there isn’t a platform independent way of doing that and keep the code portable I believe. So SVec<T> is really saying: “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind”.

That quoted statement still sounds like it could encapsulate the
possibility of 0 too, though. “limited inline storage if possible”
could be “and the answer to that constraint is zero in this case - but
if you happen to make the T smaller, it could become non-zero
organically/without the need to touch this code” (as the N could vary
organically between non-zero values as struct sizes change too)

Yes the quoted statement says exactly that!

Sorry, it seems we’re jumbling up the two issues: Whether the type
name should be different when asking for implicit default inline
storage length and, separately, whether explicit zero length storage
should use a different name. The discussion above, I thought, was
about the latter, not the former - but your response seeems to be
about the former rather than the latter.

You’re answering my quote about SVec, which is the type that is about implicit N, Vec (no S!) is about the N=0 case. The quote is also about the inline storage implicit N, it isn’t clear to me how you link this to the 0 case (which does not have inline storage)?

Two separate discussions/threads/patches may help keep the
communication more clear.

This is why I don’t like having this behavior on SmallVector: this is a different semantics / intent that the programmers have and this is something that I like being able to read differently.
SVec does not accept a N: if I read SVec there can’t be a possible accidental omission of N here. It has to be a choice between SmallVector<SomeType, 4> and SVec<SomeType> but not SmallVector<SomeType>.

To discuss this issue of whether accepting a default size versus
having an explicit size is a different semantic - as above, I think
it’s worth considering/comparing this to other templates and their
default template arguments. Things like std::vector’s customizable
allocators (you. don’t have to use a different name for the container
when you specify a custom allocation scheme for std::vector - but we
don’t wonder every time we see std::vector whether the user meant
to customize the allocator - we accept the default is usualy intended
(as the default buffer size would be usually intended) unless it’s
specified - if during review we think a custom allocator (or
non-default buffer size) might be merited, we could ask about it - we
probably would even if the user had written SVec the same way we
ask about other data structures today “did you mean SmallVector<T,
3>”?), unique_ptr’s custom deleters, etc.

I don’t necessarily connect with the custom allocator / custom delete analogy right now, they seem too different to me.
I’d look instead at SmallDenseMap and DenseMap maybe, or std::map and std::unordered_map which have different class names and aren’t just differentiated with a trait passed as template argument.

Finally the simple llvm::Vector case to replace SmallVector<T, 0> is because it is frequently preferable to std::vector but still isn’t readable or immediately intuitive and so is rarely used in practice (see https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h for the documented points on N=0).

I’m not sure llvm::Vector/Vec would necessarily make it more often
used in practice. Maybe a bunch of cleanup and more specific wording
in the style guide that would ban std::vector might help more there.

Though I agree that it may be suitable to have a clear name for
SmallVector<T, 0> since the “Small” is slightly misleading (though not
entirely - it’s important for the “SmallVectorImpl” benefits of
SmallVector - so renaming it to Vector and still using it via
SmallVectorImpl& might also be confusing).

Got any sense of the total value here? Major savings to be had?

(if SmallVector can do what your proposed SVec does, that leaves
the Vec - could you expound on the benefits of SmallVector<T, 0>
over std::vector? I guess the SmallVectorImpl generic algorithm
opportunities? Though that’s rarely needed compared to ArrayRef.)
If SmallVector would suffice then maybe Vec could be
ZeroSmallVector? Not sure.

ZeroSmallVector does not really address your “more vector names to pick from” concerns,

Somewhat - if SmallVector can be used instead of SVec then we
have one fewer name and less convention to base the Vec on (since
it won’t have the SVec sibling), so picking a name closer to the
existing SmallVector might be more viable/helpful.

and it is longer than SmallVector<T, 0>: shouldn’t we aim for the “default case” to be the easiest to reach / most intuitive to pick? llvm::Vec looks like “just a vector”.

Somewhat - but Vec is an abbreviation that isn’t really used otherwise
(if we consider SVec as well, I’d still push back on the introduction
of the abbreviation for both cases)

One aspect of the naming is to avoid one not being a prefix of the other (IDE completion, etc.) or them being too close to differentiate.

and llvm::Vector would be only a
case separation away form a standard name (when considering the
unqualified name - which is likely to come up a fair bit, as we’ll see
“Vector” used unqualified a lot).

Note: this is why we converged with llvm::Vec and not llvm::Vector in the revision

Yeah - certainly a tricky set of tradeoffs (autocomplete, similarity
with standard names, etc). Perhaps a bit of a discussion of what other
libraries do around this (no doubt there are a bunch of C++ libraries
that have “here’s the default vector you shuold use instead of
std::vector for these reasons” situations).

I had looked into this when the revision was sent, but I couldn’t find a library with a “vector with inlined storage” and a default N value.
Both Abseil and Folly have a SmallVector equivalent, Abseil does not have a default: https://github.com/abseil/abseil-cpp/blob/master/absl/container/inlined_vector.h#L69-L71
Folly has a default hard-coded to 1: https://github.com/facebook/folly/blob/master/folly/small_vector.h#L430
Folly has also another class, FBVector, which is really an alternative to std::vector with a different memory management strategy (growth factor, etc.). I don’t think Abseil has anything else.

Do you have other ideas of related work to look for?

Actually, the reason I originally pursued this was that I used a SmallDenseMap and happily found that it had a default number of inline elements: https://github.com/llvm/llvm-project/blob/69cd776e1ee79e72ccbdad30749eac04579715ee/llvm/include/llvm/ADT/DenseMap.h#L877
I thought “why not SmallVector, since I use that a lot more often”? But based on historical misuse of SmallVector, using a hardcoded number like “4” (like SmallDenseMap does) as the default doesn’t make much sense, so I pursued a bounded-sizeof approach.

I wouldn’t entirely object to SmallVector getting a smart default
buffer size, and llvm::Vector being an alias for SmallVector<T, 0> - I
don’t feel /super/ great about it, but yeah, if we’re going to push
the Programmer’s Manual encouragement further in terms of
reducing/removing use of std::vector, then maybe llvm::Vector isn’t
the worst way to do that.

(it might be that this would benefit from being two separate
discussions, though - one on how to provide smart defaults for
SmallVector and one on if/how to push std::vector deprecation in
favor of SmallVector<T, 0> (semantically speaking - no matter how it’s
spelled) further along)

For historical purpose: the review was actually originally only adding a default for SmallVector and nothing else
We only converged to the current proposal after a few rounds of reviews trying to tradeoff amongst folks in the review.

Perhaps you could summarize some of those discussions/decisions in
more detail here? As the RFC stands, these are my comments - I know
it’s always a tradeoff of which audiences are brought in at what
stages (though often folks send ADT/Support changes my way during
review - I should probably just set up a Herald rule for these
things).

Sounds like maybe I am in agreement with the original version of the
review, then.

Likely: the review evolved because I opposed to changing the status quo in this direction I guess.

It sounds like there is pretty good consensus here to just let SmallVector<T> “just work” and choose a default N. The one exception seems to be that Mehdi has a strong objection having the default N possibly be 0 (to ensure the bounded-sizeof property). So let’s shift the discussion with a laser focus to specifically that point.

Responding to the arguments I’ve seen presented so far:

I’m not convinced that SmallVector with defaulted N==0 is really error prone. Mehdi, I like your definition of “I am willing to pay for some limite inline storage if possible but I don’t have a N in mind” and I think we can apply it to SmallVector; I don’t think adding SVec is needed for that.

I understand your point, I still strongly disagree for the reason previously discussed: in particular I don’t understand why applying it to SmallVector would be better than what we converged on with SVec.

I’d add that we have SmallVector the way it is and it has been serving us pretty well so far: I’m not unhappy with the status quo for example. Changing the status quo deserves a stronger case than what I see here, in particular in light of the downside I see and also the alternative (SVec).

I don’t understand the concern about “forgetting the N”. In the “new world” of SmallVector<T>, any SmallVector<T, N> in new code would be treated as “I profiled this code and this explicit N is the right choice”. That is, adding an explicit N is going to be rare and deliberate (possibly with reviewers insisting on a comment!) – this is not something that one can “forget”.

That may be another aspect of our disconnect: you seem to propose a different approach to SmallVector where N is exceptional. I am seeing your addition as a convenience for a default case, that does not preclude from using local knowledge to be more specific. You may be right that on the long run we’d end up with a majority of unspecified N (I think I proposed in the revision llvm::Vector for this instead by the way), I just see SmallVector as something that is more of a conscientious choice at the moment.

clattner · November 17, 2020, 3:26pm

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thoughts/suggestions:

Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.
The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is rename SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).
If people are concerned about the default being bad, then you could choose to make “SmallVector<T, -1>” be the thing that autosizes. I tend to agree with your viewpoint that the N chosen is arbitrary in almost all cases anyway though, so I wouldn’t go this way.

Vec is a convenience alias for SmallVector<T, 0>. It lets us get the (little-known?) benefits of SmallVector even when it has no inline elements (see https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h). The recommendation in the patch is to use this when the SmallVector is on the heap.

These benefits are tiny, I really don’t think it is worth introducing a new name for this. I think this is better handled with a change to CodingStandards (saying “don’t use std::vector”) and the programmers manual.

-Chris

_sean_silva · November 17, 2020, 9:40pm

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I’m happy to just add the default.

Thoughts/suggestions:

Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

– Sean Silva

dblaikie · November 17, 2020, 9:42pm

We've pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it's ready for wider discussion: ⚙ D90884 [SmallVector] Add `SVec<T>` / `Vec<T>` convenience wrappers.

SVec<T> is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this "on the stack, where a "small" number of elements are expected".

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thoughts/suggestions:
- Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector<LargeThing> defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

Out of curiosity: Why a single rather than zero?

mehdi_amini · November 17, 2020, 10:14pm

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I’m happy to just add the default.

Thoughts/suggestions:

Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

Looks fine! Thanks

dexonsmith · November 17, 2020, 10:15pm

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I’m happy to just add the default.

Thoughts/suggestions:

Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

I’d be a bit concerned about this for two reasons:

Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn’t have them. If we’re going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.
The static_assert is going to fail differently on different platforms, since T’s size will tend to be platform-dependent for large T. I’m not sure it’ll be predictable enough.

– Sean Silva

The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is rename SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).

I like the idea of a minor rename, although a bit different:

SmallVectorBase => VectorBase
SmallVectorImpl => VectorImpl (we could keep both around for a transition period)
(etc.)
SmallVector => Vector, but give Vector a default small size of 0
Add SmallVector as a wrapper around Vector that has a nice default for the small size:

template <class T>
using SmallVector = Vector<T, DefaultInlinedElementSize<T>>;

_sean_silva · November 18, 2020, 1:29am

We’ve pretty happy now with a patch that adds two wrappers around SmallVector that make it 1) more convenient to use and 2) will tend to mitigate misuse of SmallVector. We think it’s ready for wider discussion: https://reviews.llvm.org/D90884

SVec is a convenience alias for SmallVector<T, N> with N chosen automatically to keep its size under 64 Bytes (that heuristic is easy to change though). The recommendation in the patch is to use this “on the stack, where a “small” number of elements are expected”.

Hey Sean,

I agree with other comments that his approach unnecessarily fragments the API surface area for a core class. You’re doing two things here: 1) adding a new name for an existing thing, and 2) adding a default.

My major concern and objection is about #1, since the codebase will get to be a mishmash of the two. I’d rather keep the world consistent here and have an opinionated “one way to do it”.

Thanks Chris. That was my original proposal when I first drafted the patch, and I’m happy to just add the default.

Thoughts/suggestions:

Adding the default seems very reasonable to me, and I think that 64 bytes is a good default. I think you should change the behavior so that SmallVector defaults to a single inline element instead of zero though. Perhaps generate a static_assert when it is crazy large.

That would work for me.

Mehdi, would SmallVector, defaulting to a minimum single inline element with static_assert(sizeof(T) < 512) address your concerns?

I’d be a bit concerned about this for two reasons:

Adding a large inlined element by-default may reinforce one of the problems with SmallVector, that people accidentally put big vectors in places that shouldn’t have them. If we’re going to make accidentally large inlined vectors more ergonomic, I think we ought to make 0-element versions more ergonomic too.

I would prefer 0, but I’m ok with 1 because SmallVector<SmallVector> avoids the “exponential” effect of SmallVector<SmallVector<T, 4>, 2> == 8 inlined T’s (+ headers and stuff).

As an experiment, I added in a static_assert(sizeof(T) < 512) into SmallVector. Almost all the errors were due to nested SmallVector<SmallVector<T, 4 or 8>, 4 or 8> and such. The only case I could see in the error log that was not due to nested SmallVector’s was this legitimately enormous struct in llvm-objcopy: https://github.com/llvm/llvm-project/blob/67e0f791c93a23d0a523f3f05082c020f7c9109f/llvm/tools/llvm-objcopy/CopyConfig.h#L149 (which, btw, is used only in a SmallVector<T, 1>)

So I feel pretty confident that 1 inline element from SmallVector is probably less or equal to what users would choose. Thus, I still think 1 inline element minimum is net positive if Chris won’t budge on lowering that to 0.

Chris, thoughts?

The static_assert is going to fail differently on different platforms, since T’s size will tend to be platform-dependent for large T. I’m not sure it’ll be predictable enough.

Agreed.

– Sean Silva

The name SmallVector has always been confusing, InlinedVector is more principled, but is even more verbose for such a common type. If you think it is worth shrinkifying the name SmallVector, then I think the best thing to do is rename SmallVector (perhaps to IVector?) everywhere in the codebase. I don’t think that introducing a redundant name is a good thing (except to smooth the transition).

I like the idea of a minor rename, although a bit different:

SmallVectorBase => VectorBase

SmallVectorImpl => VectorImpl (we could keep both around for a transition period)

(etc.)

SmallVector => Vector, but give Vector a default small size of 0

Add SmallVector as a wrapper around Vector that has a nice default for the small size:
template <class T>
using SmallVector = Vector<T, DefaultInlinedElementSize<T>>;

I’m generally +1 on making SmallVector<T, 0> be easier to use, but it seems that the discussion of default N is already complex enough, so for now let’s focus only on the default N situation.

– Sean Silva

_sean_silva · November 25, 2020, 11:47pm

Ping.

It sounds like we left off with consensus around just making SmallVector have a default N.

We were converging to a default N policy of:

at least 1 inlined element
sizeof(SmallVector) <= 64 when it doesn’t contradict the “at least 1 inlined element”.

Any objections to moving forward with that?

– Sean Silva

dblaikie · November 25, 2020, 11:51pm

Ping.

It sounds like we left off with consensus around just making SmallVector<T> have a default N.

We were converging to a default N policy of:
- at least 1 inlined element

I'm still not sure why "at least 1" is an important goal. I'd be
curious to hear from Chris/others why that's especially valuable.

Michael_Kruse2 · November 26, 2020, 6:55am

If N is 0, then it's not a small-size optimized vector, in contrast to
what the name implies.

Actually, if the intention is to have no inlined elements, I think one
should use std:vector. Generally, projects should not reinvent STL
containers and prefer the standard library one unless there is a good
enough reason not to use them.

Michael

Topic		Replies	Views
RFC: Should SmallVectors be smaller? LLVM Dev List Archives	16	288	June 23, 2018
PSA: SmallVector<T> Just Works LLVM Dev List Archives	2	184	November 4, 2021
Adding a size-based vector to libc++’s unstable ABI C++ libcxx	8	1930	June 10, 2025
Removing std::vector from APIs (was Re: Mutating the elements of a ConstantArray) LLVM Dev List Archives	7	165	May 16, 2009
RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers. LLVM Dev List Archives	0	97	December 2, 2020

RFC: [SmallVector] Adding SVec<T> and Vec<T> convenience wrappers.

Related topics