-ftrivial-auto-var-init=pattern vs =uninitialized and union initialization

Hi everyone,

I am trying to enable -ftrivial-auto-var-init=pattern on various code and noticed inconvenient inconsistency with code like [1].

According standard, as I understand, only the first member of the union must be initialized and the tail of the second member can stay uninitialized. If we use -ftrivial-auto-var-init=pattern it fills the tail using our pattern. However by default GCC, clang (without -ftrivial-auto-var-init), msvc all initialize entire union with 0s. Not sure if it’s just coincidence or guaranteed feature.

So -ftrivial-auto-var-init=pattern breaks such code. Especially bad if you don’t know that U is a union and ={} looks good there.

Should we consider giving up here and using zeroes for union tails even with -ftrivial-auto-var-init=pattern?

  1. Example:

union U {
char small[2];
char large[100];
};
void f(void*);
void test() {

union U u = {};
f(&u);
}

Hi Vitaly,

Indeed: there are cases where clang happened to honor the standard without intending to (through the magic of memcpy’ing globals), and auto-init=pattern broke this. I got a report about it a while ago and agree it’s an issue, I haven’t had time to fix it. It’s related to what C calls “designated initializers” to initialize structs. The relevant standardese from C17 are in 6.7.9 Initialization (the grammar has “designator” being either “ [ constant-expression ] ” or “ . identifier ”).
The relevant rules are:

“””
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
“””

I had fixed something in https://reviews.llvm.org/D61280 but it’s not sufficient.

I don’t think fixing this is giving up: it’s doing what the standard mandates.

Are you interested in fixing it?

Thanks.
Somehow I read the same text but interpreted it differently. Now it makes sense, it must be zeroed and trivial-auto-var-init needs improvement.
I’ll will create a patch.

For code line this:
void used(void *);
union unmatched { char c[2]; char i[7]; };
void test1() {
union unmatched U = {1};
used(&U);
}

tryEmitAbstractForInitializer already emits

@__const.test1.U = private unnamed_addr constant { [2 x i8], [5 x i8] } { [2 x i8] c"\01\00", [5 x i8] undef }, align 1

Then we replace all undefs with pattern.
So I believe I need to fix tryEmitAbstractForInitializer to emit zeroes. However it breaks a bunch of tests.

Does this make sense?

For code line this:
void used(void *);
union unmatched { char c[2]; char i[7]; };
void test1() {
union unmatched U = {1};
used(&U);
}

tryEmitAbstractForInitializer already emits

@__const.test1.U = private unnamed_addr constant { [2 x i8], [5 x i8] } { [2 x i8] c"\01\00", [5 x i8] undef }, align 1

Then we replace all undefs with pattern.
So I believe I need to fix tryEmitAbstractForInitializer to emit zeroes. However it breaks a bunch of tests.

Does this make sense?

Yes. We’ll presumably need to track a bit on APValue’s struct and union representations to model whether padding is zeroed or undefined. We already track on CXXConstructExpr whether the padding is zeroed, but don’t preserve properly everywhere else.

This is http://llvm.org/PR11742 by the way. =)

Hi Vitaly,

Indeed: there are cases where clang happened to honor the standard without intending to (through the magic of memcpy’ing globals), and auto-init=pattern broke this. I got a report about it a while ago and agree it’s an issue, I haven’t had time to fix it. It’s related to what C calls “designated initializers” to initialize structs. The relevant standardese from C17 are in 6.7.9 Initialization (the grammar has “designator” being either “ [ constant-expression ] ” or “ . identifier ”).
The relevant rules are:

“””
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
“””

Unions are not aggregates in the C standard, so the last sentence does not apply.

https://reviews.llvm.org/D68115

Hi Vitaly,

Indeed: there are cases where clang happened to honor the standard without intending to (through the magic of memcpy’ing globals), and auto-init=pattern broke this. I got a report about it a while ago and agree it’s an issue, I haven’t had time to fix it. It’s related to what C calls “designated initializers” to initialize structs. The relevant standardese from C17 are in 6.7.9 Initialization (the grammar has “designator” being either “ [ constant-expression ] ” or “ . identifier ”).
The relevant rules are:

“””
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
“””

Unions are not aggregates in the C standard, so the last sentence does not apply.

I assume it’s not going to hurt if we zero init there? I see a lot of code which assume that large fields will be initialized, and not sure how to use non-zero behaviour.

https://reviews.llvm.org/D68115

Hi Vitaly,

Indeed: there are cases where clang happened to honor the standard without intending to (through the magic of memcpy’ing globals), and auto-init=pattern broke this. I got a report about it a while ago and agree it’s an issue, I haven’t had time to fix it. It’s related to what C calls “designated initializers” to initialize structs. The relevant standardese from C17 are in 6.7.9 Initialization (the grammar has “designator” being either “ [ constant-expression ] ” or “ . identifier ”).
The relevant rules are:

“””
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
“””

Unions are not aggregates in the C standard, so the last sentence does not apply.

I assume it’s not going to hurt if we zero init there? I see a lot of code which assume that large fields will be initialized, and not sure how to use non-zero behaviour.

There are still platforms where such code breaks naturally anyway because the zero value for a particular type is not the same as having zeroes for all the bytes of its object representation. Under the status quo, -ftrivial-auto-var-init has some ability to act as a tool for portability. Although, as noted in this thread, the status quo is neither here nor there with actually using the pattern (or using zeroes or something else).

So this means that my original diagnosis was correct?
What are we going to do about -ftrivial-auto-var-init

  1. Keep as-is and forse folks to replace “U u = {}” with “U u; u.large = {}” or memset
  2. Update https://reviews.llvm.org/D68115 to engage only with =pattern