Plans for CXXConstructorDecl?

Hi all, I’ve been following recent changes to the AST bitfield definitions in clang/include/clang/AST/DeclBase.h. In particular, the NumCtorInitializers field in CXXConstructorDeclBitfields has shrunk over several releases (from 21 bits down to 16). This reduces the maximum number of initializer‑list entries a constructor can have, and indirectly reduces the universe of C++ programs that Clang can compile. Although Human code seldom hits this limit, machine generated code easily can. For example, Verilator hits that limit when generating large designs. Is there a long‑term plan to keep reducing this bitfield, move it into a separate class member, or is the recent shrinkage a temporary expedient? Is somebody working on this currently, or is there a recommended way to address it? Any guidance would be appreciated.

Thanks in advance,

Michael

Currently that field is just the ‘easy’ place to steal bits every time we need it. There are no plans currently in place to reduce it, but every time we need a new bit for something, that is the convenient place to take a bit from.

At one point, we can discuss/should discuss increasing the size of the DecBitfields types, or at least, figuring out what the size of each DecBitFields currently is, and which we can increase to be the same as the ‘maximum’ (as they are in a union IIRC?) and get bits back for things like this.

The standard suggested ‘implimit’ for this value is 6144, so I could definitely see ourselves shrinking it down to that at one point (or more likely, whatever bit value keeps us above that) unless there is user-pressure to increase sizes.

1 Like

Right now non-debug builds currently silently generate incorrect code if they have too long of constructor list. At a minimum, clang should throw an error to the user and stop the build. Continuing to decrease this value will start to hit more users. Given that the versions with a bitwidth of 16 are not widely used in mainstream distributions, this could very well break more systems that use machine generated code.

Oh wow, that seems very contrary to the general direction of technology, where programs get bigger and bigger, and yet Clang is evolving to handle exponentially smaller inputs, effectively tossing out backward compatibility. For reference, Verilator, the leading open source SystemVerilog simulator, can easily generate C++ classes that have 2^18 initializer-list entries. Bringing the limit down to 8192 would require significant rewriting of the application. Is there a reason why NumCtorInitializers needs to be part of CXXConstructorDeclBitfields and can’t just be pulled out? I apologize in advance for my lack of familiarity with the code base!

Agreed we should more gracefully diag the imp-limit. Patches welcome :slight_smile:

1 Like

Clang has a VERY high memory pressure. The bitfields are an attempt at a size optimization that we’ve used pretty extensively to try to minimize the size of AST nodes, which get created a LOT.

We COULD pull ‘everything’ out of those bitfields, but until there is some necessity/understanding that it won’t cause extraneous wasted space due to padding/pressure, we try to keep them there for size-optimization, unless there is a REALLY good case/reason why it shouldn’t be.

Basically: Pulling it out would end up causing every single constructor declaration in the program to increase our memory pressure by 64 bits (thanks to alignment) due to EVERY constructor needing this info.

A patch + RFC to do so would be a way forward here. More likely/perhaps optimally would be to put the value in trailing-storage and leave a bit in the bitfield for ‘has initializers’. This comes at a runtime cost of course, but one that is perhaps acceptable.

I dug into this a bit, and I don’t think this is possible. We create the constructor significantly before we know anything about the initializers, so this WOULD have to be a full-on-member.

Perhaps a somewhat easier patch at least, but a tougher RFC.

We already have a CtorInitializers, which is a pointer to the array of initializers. We can add the number of initializers as prefix to that array.

(Probably we could save some space by moving the pointer into trailing storage.)

Author of Verilator here, some points:

  1. There is a missing error when NumCtorInitializers overflows. This seems accepted as a problem.
  2. The NumCtorInitializers limit was decreased, so causes previously good programs to break in weird ways (currently before add error), or error out (once add error)
  3. While perhaps the suggested imp limit for NumCtorInitializers is lower, in reality the other compilers that are typically used are fine to at least 33K (the point I tested - likely much larger), namely GCC 12, GCC 15, MSVC 19.21, MSVC 19.43.
  4. Although this arises generated code, any workaround in code generators typically takes a few years to propagate to users.
  5. If we do change the generator, it’s not obvious how to workaround this issue in code without significant refactoring that is not backward compatible with the previous classes. If there is a way to “split” the constructor some how (without changing the class members themselves) to work around this, please advise. This is especially interesting as would let people get a new generator to work with the currently released clang limitation.

Just a note: the generated code only calls the constructor once, it won’t matter to performance if a large NumCtorInitializers disables optimization. Although that seems not at issue here.

I appreciate the desire to keep the AST information small, as this certainly makes a difference in memory and runtime. Might it be possible to just have an overflow structure? e.g. have only say 8 bits for NumCtorInitializers, and once that field is set to 255 then it means to look at another side structure for the real value? The cost then for most designs < 255 initializers is whatever reads that field will have a predicted taken branch to get the value (the other untaken branch is to access the side structure). It’s possible such a mechanism could be generalized for other “tight” fields too - which would mean there would never be a need to reduce any such Ast-imposed limits in future versions. (Verilator does this for constants in its AST FWIW.)

Thanks for answering/considering.

How about just a LLVM config flag, so everybody can have their cake and eat it?

diff --git a/clang/include/clang/AST/DeclCXX.h b/clang/include/clang/AST/DeclCXX.h
index 8802664..571cc3e 100644
--- a/clang/include/clang/AST/DeclCXX.h
+++ b/clang/include/clang/AST/DeclCXX.h
@@ -2734,19 +2734,32 @@ public:
return init_const_reverse_iterator(init_begin());
}

+#ifdef LLVM_MANY_CTOR_INITIALIZERS
+ unsigned NumCtorInitializers;
+#endif
+
/// Determine the number of arguments used to initialize the member
/// or base.
unsigned getNumCtorInitializers() const {
- return CXXConstructorDeclBits.NumCtorInitializers;
+#ifdef LLVM_MANY_CTOR_INITIALIZERS
+ return NumCtorInitializers;
+#else
+ return CXXConstructorDeclBits.NumCtorInitializers;
+#endif
}
void setNumCtorInitializers(unsigned numCtorInitializers) {
+#ifdef LLVM_MANY_CTOR_INITIALIZERS
+ NumCtorInitializers = numCtorInitializers;
+#else
CXXConstructorDeclBits.NumCtorInitializers = numCtorInitializers;
// This assert added because NumCtorInitializers is stored
// in CXXConstructorDeclBits as a bitfield and its width has
// been shrunk from 32 bits to fit into CXXConstructorDeclBitfields.
assert(CXXConstructorDeclBits.NumCtorInitializers ==
numCtorInitializers && "NumCtorInitializers overflow!");
+#endif
}

(Note that I’m not speaking for the project, just as a maintainer, and our lead Maintainer in the FE is out on leave for the rest of the month). I think that there IS an issue that we could diagnose and probably SHOULD unless we choose to do something else about it.

We’re currently at ~65k, right? I’m not saying there are PLANS to reduce that number, and thanks to this discussion, I’d be sure to mention on review/push back on any attempts to reduce this number further.

None with delegating constructors that I’m clever enough to come up with. The only way I can think of would be to stop using initializers at a certain number, and use body-init for the rest.

Nope, that is not a concern.

We discussed something like this above (though with the bitfield set at ‘1’ :slight_smile: ), part of the problem is that thanks to memory fragmentation we VERY much attempt to unify our allocations via ‘trailing storage’ (over-allocate, then store the extra things in the storage at the end). However, this requires knowing the count before we allocate the type, which we don’t know in this case (we create the CXXConstructorDecl object as soon as we see the closing paren, and THEN set the number of initializers when we’re done parsing those). @efriedma-quic had an alternative way (changing the count/array to be more of a pascal-string-esque collection) that we’d perhaps consider.

THAT is something I’d be very much against in the upstream. You’re welcome to maintain/distribute a fork, but having a configuration like that is not something we’re typically interested in. They cause all sorts of ABI issues for our plugins/etc that make diagnosing on these bug trackers VERY difficult, so we avoid these at all costs.

CtorInitializers is a little weird in that it is a LazyOffsetPtr for some reason, stored in external storage? So it gets a little weirder than just a pointer, but yeah, point made.

As far as moving THAT off into trailing storage: That doesn’t really save us anything, right? The point of TrailingStorage is that we can omit allocating it “sometimes”, which we couldn’t do for this array/pointer, since it is an ‘always exists’. We don’t seem to know at the time of CXXConstructorDecl creation whether we are going to have initializers or not.

So I think just moving this field into the type is about all we could do. Perhaps somethign to discuss next month when Aaron returns.

Bumping this for comment by Aaron.

M

We definitely shouldn’t reduce this below 16 bits.

I am heartened in a way that Clang appears to scale acceptably to classes with 64K+ members.

It’s separated because the initializers list is part of the function definition. We have to be able to create a ctor declaration without attaching initializers to it for essentially the same reasons we have to be able to create a function declaration without attaching a body.

Maintaining that bias is probably the right call, but we could definitely just put a length prefix on the allocation. Or maybe there’s a more holistic opportunity around how we represent the as-written initializers vs. the semantic list.

@AaronBallman

I have a few thoughts on the topic.

  1. I think the standard implementation limit of 6144 is too small for our implementation; I don’t foresee us going that low. I think @rjmccall is right that we probably should not lower below 16 bits, though that creates tension when we need to steal one more bit in the future.
  2. We definitely should produce an error when we have more initializers than the bit-field can hold. This is a more general problem: we should have a diagnostic any time any such bit-field would be overflowed.
  3. I’d be opposed to a config flag for changing the value, that’s more appropriate for a fork rather than a config flag IMO.
  4. I think the existing limit makes a pretty reasonable tradeoff for most user, even though it’s not ideal for edge use cases like in this thread. The current behavior isn’t a bug, we intentionally lowered the limits. That said, I think we’re not opposed to supporting the edge use cases (at least from my reading of the thread), but I don’t think we’re planning on any changes here proactively either. So if you’d like to see an improvement in this area, I think you’d have to push the patch through yourself. Our biggest concern will be performance for the common use cases, but there were ideas raised here which may make a reasonable performance tradeoff. If you get to the point of submitting a patch, we have a compile time tracker that can be used to estimate whether the performance change will be acceptable or not.

Thanks for your reply Aaron.

Of the different approaches that have been proposed above, what is the consensus on the best way to go? Would it make sense to try the simple approach of just pulling the field out, and measuring the performance impact? Or is the feeling that the space requirements are too great?

M

I find reasoning about performance impacts is challenging, so I’d start with the simple approach and see how impacts compile times, then go with a more complicated approach if the performance is worse. The suggestion from @efriedma-quic would probably be the next thing I’d try if the simple approach didn’t work.

Sounds like the right move.

To John’s point about declarations and definitions, we probably have way more constructor declarations than definitions, so only storing the number of initializers when initializers are actually present seems like a win.