RFC: Add Clang attribute to ensure that fields are initialized explicitly

higher-performance · August 8, 2024, 7:53pm

This is a proposal to add a new Clang-specific attribute to ensure that field initializations are performed explicitly.

For example, if we have

struct B {
  [[clang::explicit]] int f1;
};

then the warning would trigger if we do B b{};, e.g.:

field 'f1' is left uninitialized, but was marked as requiring initialization

This prevents callers from accidentally forgetting to initialize fields, particularly when new fields are added to the type.

There is a prototyped implementation of this proposal in:

github.com/llvm/llvm-project

Add Clang attribute to ensure that fields are initialized explicitly

llvm:main ← higher-performance:required-fields

opened 07:10PM - 05 Aug 24 UTC

higher-performance

+287 -5

This is a new Clang-specific attribute to ensure that field initializations are …performed explicitly. For example, if we have ``` struct B { [[clang::explicit]] int f1; }; ``` then the diagnostic would trigger if we do `B b{};`: ``` field 'f1' is left uninitialized, but was marked as requiring initialization ``` This prevents callers from accidentally forgetting to initialize fields, particularly when new fields are added to the class. ### Naming: We are open to alternative names; we would just like their meanings to be clear. For example, `must_init`, `requires_init`, etc. are some alternative suggestions that would be fine. However, we would like to avoid a name such as `required` as `must_specify`, as their meanings might be potentially unclear or confusing (e.g., due to confusion with `requires`). ### Note: I'm running into an issue with duplicated diagnostics (see lit tests) that I'm not sure how to properly resolve, but I suspect it revolves around `VerifyOnly`. If you know the proper fix please let me know.

FAQ

Naming

The choice of explicit was based on the fact that the word explicit in the language has always meant “must be spelled out in the source code”, and the same meaning is used here – it is difficult to imagine a different meaning for explicit in such a context.

Nevertheless, explicit is just a suggestion, and this proposal is open to other names. I’d suggest must_init, requires_init, as some alternatives to consider that also seem fairly self-explanatory.

Regardless of the name, we should avoid causing potential confusion for readers who are unfamiliar with the attribute. For example, required may cause confusion with requires, and must_specify on a field may leave users wondering “where must I specify what?”

Opt-Out Mechanism

The intention is that this is a warning, and thus disabling the warning at the usage site would be sufficient to opt-out of all initializations within the expression.

There is no intention of allowing opting out of initializing a single field in a class that has multiple fields requiring explicit initialization. The rationale is that valid usages of this are likely to be very rare, and it would also potentially greatly complicate the proposal.

What About Non-Aggregates?

I find this be a difficult question. Some plausible options include:

Allowing this attribute inside all classes, or
Only allowing this attribute inside classes that are aggregates, or
Only allowing this attribute inside classes that would be aggregates in C++20 (to avoid inconsistent behavior in pre-C++20 mode)

For the sake of ease of implementation & understandability, I have implemented the first option so far.
I am not entirely sure as to what the correct behavior for non-aggregate classes would be, but I suspect this diagnostic should avoid triggering on member-initialization lists in constructors, because member-initialization lists are already coupled tightly to the rest of the class, and generally authored together. Forcing the class’s own constructor to initialize its own field (especially in a manner where field-initializers are insufficient) is a much less common use case.

However, other designs are plausible, and to allow better more flexibility in the future evolution of the design, perhaps we should enforce that this attribute is only applied on aggregates.

Input on the best path forward here is welcome.

Variables That Aren’t Fields

This attribute is only meant for fields. I don’t expect it to be very useful for other kinds of variables, as “this must be initialized” would be redundant for them – simply initializing them would enforce the same.

Class-Level Attributes

It would make sense to allow this attribute on an entire class. However, my inclination is to defer the implementation of that extension rather than blocking this proposal on that extension, for a few reasons:

A mild desire to see this used in practice prior to extending the design.
Avoiding the natural subsequent desire to include an “opt-out” mechanism (e.g., [[clang::explicit(false)]]), which would further complicate the proposal & implementation
Additional work on my part that I’d have to find bandwidth for

I don’t expect the current design of this proposal would conflict with such extensions in the future.

Opt-In vs. Opt-Out

One can imagine a compiler flag flipping the requirements, using an opt-out attribute instead of opt-in.

I expect such a feature would be very difficult for most users to utilize in practice, as uninitialized structs are very common, and it is quite intractable for users to modify third-party & system headers to annotate everything fully. Common code as simple as

struct stat s;
int result = stat("path", &s);

would now break due to the lack of field initializations.

However, if the need for this nevertheless arises, I believe it can be achieved coherently with the current design, in a similar manner as to opt-out attributes for a class-level attribute.

Clang consensus called in this message.

jyknight · August 9, 2024, 3:10am

I think something like this is a good idea.

“left uninitialized” is the wrong wording, since in B b{}, all fields are zero-initialized. Nothing is uninitialized.

Similarly, what’s the behavior of

struct B {
  [[clang::explicit]] int f1 = 4;
};
B b{};

I’d assume this is valid and still triggers a warning about f1 not having been explicitly initialized, since the intent is to require the user to provide a value explicitly, even if the default for the field isn’t a UB-inducing uninitialized value.

higher-performance · August 9, 2024, 3:50am

Yes, the error message wording could be better (“not explicitly initialized”).

With my current design & implementation, this is indeed a warning:

error: field 'f1' is left uninitialized, but was marked as requiring initialization [-Werror,-Wuninitialized-explicit]
    4 | B b{};

For aggregates, this seems fairly useless when the warning is enabled (unless I’m missing something?), so perhaps it should be disallowed entirely? However, it seems useful inside constructors’ member initialization lists, if we avoid the warning there as proposed.

ilya · August 9, 2024, 10:10am

I really like the idea:

the implementation of this is very simple,
the annotation can be ignored or removed without changing semantics of existing code, only degrading the diagnostics quality;
it allows to help find problems with the common pattern of “parameter objects” without making all parameters optional or overloads with constructors (I think these are common enough to warrant a language extension).

This clearly looks like a good return on investment, given how relatively simple this change is. A few thoughts from me.

Opting-out single fields

There is no intention of allowing opting out of initializing a single field in a class that has multiple fields requiring explicit initialization. The rationale is that valid usages of this are likely to be very rare, and it would also potentially greatly complicate the proposal.

this actually looks important if we add a class attribute, so we can distinguish the required and optional parameters in cases like:

[[clang::explicit]] struct FooParams  {
  int req;      // required
  int opt = 10; // optional
  // usually many more parameters ...
};

void foo(FooParams p);
// as a replacement for:
void foo(int req, int opt = 10);

opting out all fields that are initialized in-place, but aren’t marked explicit themselves seems like a good option:

[[clang::explicit]] struct FooParams {
  std::vector<int> xs;      // required
  std::vector<int> ys = {}; // optional
  [[clang::explicit]] int* out_ptr = nullptr; // required
  // ... but still initialized to avoid UB when used
  // without attributeds.
}

Non-aggregates

I feel you’re right that because we changed the definition of aggregates between C++17 and C++20, warning when this attribute is on non-aggregate types by default is going to be a bit of an unnecessary migration pain. That being said, it seem like it probably will capture some coding errors when people intended to make the class aggregate, but missed something (e.g. added the =default constructor in C++20), which seems useful.

I am also torn here, but I don’t have a strong preference either way. Probably just doing something most simple is the path I would pick, i.e. not showing any warnings.

Class-level attributes

I think it would be a nice-to-have, but I also agree with:

I don’t expect the current design of this proposal would conflict with such extensions in the future.

Landing the extension for fields is strictly better than not having it and extending it for classes seems like a future work someone else may want to pick up. We have a small dedicated team of people working on Clang now and this looks like an interesting project for us to tackle, especially if my personal beliefs that this extension is useful would be reinforced by the discussions on this RFC.

Opt-In vs. Opt-Out

I expect such a feature would be very difficult for most users to utilize in practice, as uninitialized structs are very common, and it is quite intractable for users to modify third-party & system headers to annotate everything fully. Common code as simple as

I agree that this is probably a no-go in practice. Patterns like:

struct Point { int x; int y; };
void foo() {
  Point p;
  p.x = 1;
  p.y = 2;

  Point p{1};
  // ... some code ...
  p.y = compute_y();
}

are way too common to ever allow using this broadly without rewriting all the code. I can imagine this working with some per-file opt-in (similar to diagnostic push / pop), but that definitely seems like an avenue for future work, not for the initial patch.

Should we also allow to put this attribute on bases?

Aggregates naturally support base classes, although that’s much less common than fields.

#include <vector>

struct Agg : [[clang::explicit]] std::vector<int> {
    [[clang::explicit]] int a;
};
Agg x{{1,2,3}, 2}; 
Agg x{{1,2,3}, .a = 2};  //  This is supported as an extension too (not standard C++)

We should probably consider adding support for base classes and marking the base classes as explicit if we add an attribute on class. However, this could be also scheduled as future work.

Endill · August 9, 2024, 10:53am

CC @AaronBallman

higher-performance · August 9, 2024, 5:36pm

Broadly agree with those points from a design standpoint. Implementation-wise, I’d definitely prefer first landing a simple (e.g., just per-field) forward-compatible v1, and then broadening the uses in a v2 late.

A minor clarification on this bit:

this actually looks important if we add a class attribute, so we can distinguish the required and optional parameters in cases like

That seems like a good idea. What I was thinking was that I didn’t want to add a novel opt-out mechanism, but if the opt-out mechanism for a class-level attribute is just the existing in-class member initialization, that seems fine to me.

cor3ntin · August 13, 2024, 9:43am

Thanks for this proposal.
The problem of uninitialized field is much less of a problem than uninitialized variables in general, and the design does not seem to generalize beyond aggregates, as you are not proposing to change constructors.

And…I don’t find the motivation for a feature limited to the initialization of aggregates that compelling, it’s quite narrow.

Other questions include:

Can the attribute be applied to an object with a default constructor?
How does the attribute handle partially initialized subobjects (aggregates and arrays) ?
Is it worth adding that attribute despite the work on erroneous behavior and the indeterminate attribute? (also, are you aware of [RFC] Apply clang::uninitialized attribute on record types ?)
If an API calls for a field to always be initialized, is adding a constructor not a better design than relying on a vendor specific attribute?
I have the same question as @jyknight for default member initializers.

higher-performance · August 13, 2024, 5:05pm

This would make the class a non-aggregate, thus falling under the “What About Non-Aggregates?” question – input on the best path forward is definitely welcome here!

Currently it’s something like this:

struct S {
  int f = 0;
};

struct B {
  [[clang::explicit]] S s1;
  [[clang::explicit]] S s2;
};

struct D {
  B b[2];
};

int main() {
  D d = {{{{.f = 1}}}};
  (void)d;
}

input.cc:15:11: error: field 's2' is left uninitialized, but was marked as requiring initialization [-Werror,-Wuninitialized-explicit]
   15 |   D d = {{{{.f = 1}}}};
      |           ^
input.cc:15:21: error: field 's1' is left uninitialized, but was marked as requiring initialization [-Werror,-Wuninitialized-explicit]
   15 |   D d = {{{{.f = 1}}}};
      |                     ^
input.cc:15:21: error: field 's2' is left uninitialized, but was marked as requiring initialization [-Werror,-Wuninitialized-explicit]

There’s some room for improvement in the error messages, and we’ll try to polish it a bit, but that’s the general behavior.

The motivation here is entirely different than the motivations for initializing local variables (memory safety/UB/etc.). The motivation here is preventing a certain class of logical bugs that arise when structs are used as “parameter-objects”. Consider when the consumer of the struct (which is often not just in a different file, but from an entirely different author, project, or codebase) adds a new mandatory/expected field that the producer has no way to become aware of, and thus does not provide. The consumer needs a way to force the producer to provide a value for the field.

What I would compare this against is parameter lists, not uninitialized variables. This is basically the same thing as when you extend a function’s parameter list: when a new parameter is added to a function, the compiler simply cannot call it anymore without providing the new parameter. We have that capability for parameter lists in the language, but not for structs that serve a similar purpose, and this is meant to address that.

This is why uninitialized local variables are a different beast entirely. The production & usage of the variable occur at the same location in the code, and thus if you want to make sure the variable is initialized – you initialize it. The safety issues are quite unrelated to the problem surrounding fields that is being addressed here.

No, because this is intended for aggregates as discussed above, and it is also meant to work with designated initializers.

Did my reply to his question address your concern? If not, please let me know what you’d like me to elaborate on.

higher-performance · August 13, 2024, 5:39pm

@AaronBallman how do you feel about the design/proposal? Does this seem fine to move forward with?

kparzysz · August 13, 2024, 6:21pm

Are

struct B {
  [[clang::explicit]] int f1;
};

and

struct B {
  int f1;
};

considered the same type?

higher-performance · August 13, 2024, 6:24pm

I’m not sure I understand what exactly you mean by the same type. Are you asking if a mismatch in the presence of the attribute could cause an ODR violation or affect name mangling? If so, no, there is no distinction as far as the generated program goes. Otherwise, I’m not sure I understand the question exactly.

kparzysz · August 13, 2024, 6:32pm

Yes, name mangling, overload resolution, etc.

Edit: It seems from your reply that they are the same.

higher-performance · August 13, 2024, 6:39pm

Yeah, it’s not intended to affect the program in any way; it’s strictly a compile-time diagnosis, like [[nodiscard]].

higher-performance · August 19, 2024, 3:18pm

Do folks have any opinions on the naming? Is explicit a decent name? Or would must_init, or requires_init, or something else be clearer?

Sirraide · August 21, 2024, 4:36am

I was about to say that I don’t see the use case for this feature because constructors exist, but then I remembered that C++ isn’t the only programming language we support. ;Þ

So yeah, I could see how this might be useful in C in cases where zero-initialisation of a field isn’t meaningful for whatever reason (or because you can also just straight-up forget to do that).

I don’t think it’d be as useful in C++ because you can just use constructors (or static factory functions) to ensure a field is always initialised to an explicit, user-provided value.

higher-performance · August 21, 2024, 7:32pm

This is very much intended for C++. The use case, as mentioned above, is parameter-objects. Constructors are inadequate for lots of reasons, including:

Constructors do not provide as much control or granularity (initializing optional fields becomes cumbersome or error-prone)
Constructors do not allow as much brevity (they require a lot of redundant boilerplate)
Constructors do not allow designated initializers

ilya · August 23, 2024, 4:54pm

I think it’s tricky, unfortunately all of the suggested names are too easy to misinterpret. Explicit means something entirely different in C++ when used on constructors and conversion operators.

Among the ones suggested, requires_init is slightly better than must_init, but after reading through this discussion, I believe both might create an impression that this attribute is aimed at making the code safer (i.e. guarantees there are no uninitialized usage of the field that’s possible) instead of forcing certain code-style patterns. I believe it’s best to avoid this confusion.

I don’t have great alternative suggestions, though, stuff that comes to mind is [[clang::warn_if_no_init]] or [[clang::warn_if_missing_init]].
I feel it’s harder to read these as enforcing some strict contract.

Most of the comments in the thread seem to be assessing this attribute from the perspective of the potential safety benefits it provides, even though this is not what it was aiming to do.

What do people feel about having this for code-style-related purposes described by @higher-performance?
I find those compelling, given how simple the implementation of this idea is.
At Google we recommend using parameter objects and people are actually doing more complicated (and non-standard) things to get warnings on braced initialization of those structs, this attribute would be very useful to us.

@AaronBallman, @cor3ntin what are your thoughts from the code health angle?

rjmccall · August 23, 2024, 9:37pm

must_init or requires_init both seem like fine names, with the expected rule being that the field must be explicitly initialized in any constructor body or aggregate initializer. I suppose that includes attempts to use default aggregate initialization.

higher-performance · August 28, 2024, 8:53pm

Other possible names that may potentially be clearer:

explicit_init
requires_explicit_init

AaronBallman · August 30, 2024, 1:22pm

First off, thank you for the RFC!

I’d like to make sure I understand the proposal after the (good!) discussion we’ve had around its semantics. Given:

struct S {
  [[clang::<name>]] int x;
  int y;
  int z = 12;
  [[clang::<name>]] int q = 100;
};

void foo(S s);

do we expect the following behaviors:

foo(S{1, 2, 3, 4}); // No diagnostics

foo(S{.x = 100, .q = 100}); // No diagnostics

foo(S{.x = 100 }); // Diagnostic about q not being explicitly specified?

S s{.x = 100, .q = 100}; // No diagnostics
S t{.q = 100}; // Diagnostics about x not being explicitly specified

S *ptr = new S; // Diagnostics about x and q not being explicitly specified
S *ptr2 = new S{.x = 100, .q = 100}; // No diagnostics

(Basically, when the attribute is on a field, that field must have an explicitly specified value when constructing the object; implicitly specified values are insufficient such as an in-class initializer?)

If so, I think we may want to carve out some guard rails for misuse, such as:

struct S {
  int count;
  [[clang::<name>]] struct { // Should it be possible to write on fields with sizeless/effectively empty types?
  } inner;
  [[clang::<name>]] int : 0; // Not really sensible to try to initialize a zero-width bit-field
  [[clang::<name>]] int fam[]; // Definitely *do not want* an explicit initializer here!
};

struct T {
  [[clang::<name>]] int x;
  T(int val = 12) : x(val) {}
};

T t; // I have no idea what this should do; are default arguments sufficiently explicit?

(Once we open the door to non-aggregates, I think there’s quite a few design questions because constructors can do some pretty complicated stuff, such as delegating constructors, overloaded constructors, etc. I kind of wonder if we should start with aggregates and then expand later.)

And I suppose there’s a question of what this kind of code should do:

struct S {
  [[clang::<name>]] int x;
};

void foo(S s = {});
template <S s>
void quux();

void bar() {
  foo(); // Gets diagnostics, hopefully?
  quux<{}>(); // Gets diagnostics, maybe?

  size_t heh = sizeof(S{}); // Should we diagnose in unevaluated contexts?
}

In terms of the name of the attribute, I tend to like requires_explicit_init over the other options. Do we want a warning when written on an in-class initializer given that such a scenario is pretty confusing?

Topic		Replies	Views
Is there a way to make clang warn on this code? Using Clang	7	110	March 19, 2014
Additional annotations for static analysis (Objective C designated initializers) Clang Frontend	9	54	November 7, 2008
[RFC] Apply clang::uninitialized attribute on record types Clang Frontend clang , llvm	6	281	August 7, 2024
OpenCL support Clang Frontend	54	210	March 18, 2011
[XRay] RFC: Adding -fxray-{always, never}-instrument=... to Clang Clang Frontend	8	72	March 3, 2017

RFC: Add Clang attribute to ensure that fields are initialized explicitly

FAQ

Naming

Opt-Out Mechanism

What About Non-Aggregates?

Variables That Aren’t Fields

Class-Level Attributes

Opt-In vs. Opt-Out

Opting-out single fields

Non-aggregates

Class-level attributes

Opt-In vs. Opt-Out

Should we also allow to put this attribute on bases?

Related topics