question about initializing multiple members of unions

Matthew_Curtis · September 11, 2013, 1:10pm

I'm investigating an assert in clang compiling the following code:

   typedef union {
     struct {
       int zero;
       int one;
       int two;
       int three;
     } a;
     int b[4];
   } my_agg_t;

   my_agg_t agg_instance =
   {
     .b[0] = 0,
     .a.one = 1,
     .b[2] = 2,
     .a.three = 3,
   };

I'm a little uncertain as to what this *should* do.

GCC (4.4.3) produces this assembly:

   agg_instance:
     .zero 12
     .long 3

I had thought that maybe it should produce

   agg_instance:
     .long 0
     .long 1
     .long 2
     .long 3

Which is the effect that most implementations would have if you
were to execute the analogous assignments:

   agg_instance.b[0] = 0,
   agg_instance.a.one = 1,
   agg_instance.b[2] = 2,
   agg_instance.a.three = 3,

Experimenting with various orderings of the designated initializers
leads me to believe that whenever the member of the union being
initialized changes, GCC discards any previous member initializations.

For clang, this ordering

   my_agg_t agg_instance =
   {
     .a.one = 1,
     .b[0] = 0,
     .a.three = 3,
     .b[2] = 2,
   };

does not assert and produces

   agg_instance:
     .long 0
     .long 1
     .long 2
     .long 3

Though after stepping through some of the code I think this may be
accidental.

Reading sections 6.7.2.1 and 6.7.8 of the spec, it's not clear to me
exactly what the behavior should be, though I could see how GCC's
behavior would be considered compliant.

Thoughts?
Matthew Curtis

Eli_Friedman1 · September 11, 2013, 7:30pm

I would say we should either use gcc's interpretation or reject it.

-Eli

Matthew_Curtis · September 12, 2013, 9:04pm

Sounds reasonable. Unless someone has a dissenting opinion, I’ll look at fixing this by making clang consistent with gcc. Thanks, Matthew Curtis

preames · September 12, 2013, 10:35pm

I would argue somewhat strongly against adopting gcc’s interpretation. I’m not going to get into the specification debate - I’ll leave that to others who are more knowledgeable - but the gcc behavior is highly unintuitive. The closest code analogy I could think of would be this: my_agg_t agg; (agg.b[0] = 0, agg.a.one = 1, agg.b[2] = 2, agg.a.three = 3); printf(“%d, %d, %d, %d\n”, agg.a.zero, agg.a.one, agg.a.two, agg.a.three); I tried this code with both clang and gcc at various levels of optimization and got the same result: “0, 1, 2, 3”. I expect getting any other results would be considered unintuitive at the least by most programmers. Given that these code fragments are obvious candidates for manual “refactoring” moving from c++03 to c++11, this unintuitive difference seems particularly problematic. Yours, Philip Reames

Gao_Yunzhong1 · September 13, 2013, 7:23pm

Hi Matthew,
Hmm I am inclined towards treating a.zero and b[1] as two different sub-objects.
Which svn revision of clang did you use? When I used r190021 on your test case, I got an assertion:

$ clang -S -o - init.c
lib/Sema/SemaInit.cpp:2449: clang::InitializedEntity::InitializedEntity(clang::ASTContext&, unsigned int, const clang::InitializedEntity&): Assertion `CT && "Unexpected type"' failed.

I have some local patches for related initialization problems, but I cannot verify whether they fix the particular problem you have here.

- Gao.

Matthew_Curtis · September 13, 2013, 8:47pm

Hi Matthew,
Hmm I am inclined towards treating a.zero and b[1] as two different sub-objects.

Do you think that GCC's behavior is incorrect?

Which svn revision of clang did you use? When I used r190021 on your test case, I got an assertion:

The assertion is what I'm trying to fix.

Matthew_Curtis · September 16, 2013, 2:11pm

Richard, Doug,

Do either of you have an recommendation here?

Eli recommends GCC's behavior, while Philip Reames and Gao would both prefer the alternate behavior below (same as a series of assignments).

Thanks,
Matthew Curtis

zygoloid · September 16, 2013, 6:05pm

Richard, Doug,

Do either of you have an recommendation here?

Eli recommends GCC's behavior, while Philip Reames and Gao would both
prefer the alternate behavior below (same as a series of assignments).

Can you explain how treating the designated initializers as a sequence of
assignments could give the object the value {0, 1, 2, 3}?

Per C11 6.2.6.1/7, "When a value is stored in a member of an object of
union type, the bytes of the object representation that do not correspond
to that member but do correspond to other members take unspecified values."
This doesn't really say what happens if you store a value in a *subobject*
of a member of an object of union type, but the natural extension would
seem to be that the bytes of the object representation that do not
correspond to that subobject take unspecified values.

Footnote 95 allows reading the contents of an inactive member of a union
(by reinterpreting the value as the other type), but that only applies to
reads through an inactive union member, not to stores.

So under Eli's approach, we get {0, 0, 0, 3}, and under the other approach,
it can be argued that we get {unspecified, unspecified, unspecified, 3}.

I'm reasonably convinced that this is a hole in the C standard -- it
doesn't specify what to do in this case -- and Eli's approach seems
completely reasonable to me. Plus it has the benefit of being compatible
with at least two other major compilers (g++ and EDG), and doesn't require
us to invent a new representation for initializer lists.

Thanks,

Gao_Yunzhong1 · September 16, 2013, 7:20pm

Hi Richard,

According to c11 6.7.9p19, “The initialization shall occur in initializer list order, each initializer provided for a

particular subobject overriding any previously listed initializer for the same subobject; all sub-objects that are

not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.”

In Matthew’s example,

my_agg_t agg_instance =
{
.b[0] = 0,
.a.one = 1,
.b[2] = 2,
.a.three = 3,
};

If you think that “.a.three = 3” overrides “.b[2] = 2”, “.a.one = 1” and “.b[0] = 0”, then .b[0] and .b[2] should take the value

zero as if they were initialized as static objects, giving the end result {0, 0, 0, 3}.

If you think that “.a.three = 3” does not override “.b[2] = 2” and “.b[0] = 0”, then .b[0] and .b[2] will be

initialized to 0 and 2 respectively, giving the end result {0, 1, 2, 3}.

I do not think {unspecified, unspecified, unspecified, 3} is a viable interpretation.

Gao.

zygoloid · September 16, 2013, 8:37pm

Hi Richard,****

** **

According to c11 6.7.9p19, “The initialization shall occur in initializer
list order, each initializer provided for a****

particular subobject overriding any previously listed initializer for the
same subobject; all sub-objects that are****

not initialized explicitly shall be initialized implicitly the same as
objects that have static storage duration.”****

** **

In Matthew’s example, ** **

  my_agg_t agg_instance =
  {
    .b[0] = 0,
    .a.one = 1,
    .b[2] = 2,
    .a.three = 3,
  };****

** **

If you think that “.a.three = 3” overrides “.b[2] = 2”, “.a.one = 1” and
“.b[0] = 0”, then .b[0] and .b[2] should take the value****

zero as if they were initialized as static objects, giving the end result
{0, 0, 0, 3}.****

** **

If you think that “.a.three = 3” does not override “.b[2] = 2” and “.b[0]
= 0”, then .b[0] and .b[2] will be****

initialized to 0 and 2 respectively, giving the end result {0, 1, 2, 3}.

How can it give that result? agg_instance.a.zero and agg_instance.a.two
were never initialized.

Gao_Yunzhong1 · September 16, 2013, 9:21pm

How can it give that result? agg_instance.a.zero and agg_instance.a.two were never initialized.

I guess I see your point. I was just not comfortable with having unspecified values in the end result of an
initializer list; my interpretation of 6.7.9p19 is that they all should be initialized to something. But unions are
special since you also have to apply 6.2.6.1p7.

Well, if a.zero and a.two were not initialized, and if I apply 6.7.9p19, they should be initialized to zero, right?
And a.one will be initialized to 1. The end result will be {0, 1, 0, 3} ?

zygoloid · September 16, 2013, 10:28pm

Yes, with that interpretation, that seems like a possible conclusion, even
though it's a bit weird. That's not the same as the interpretation that
Matthew Curtis was suggesting ("same as a series of assignments") though,
which was the one which led to {unspecified, unspecified, unspecified, 3}.

I still think Eli's approach is the right one here.

Matthew_Curtis · September 17, 2013, 1:36pm

> Richard, Doug,
>
> Do either of you have an recommendation here?
>
> Eli recommends GCC's behavior, while Philip Reames and Gao would both prefer the alternate behavior below (same as a series of assignments).
>
> Can you explain how treating the designated initializers as a sequence of assignments could give the object the value {0, 1, 2, 3}?

Given the following program :

#include <stdio.h>

     typedef union {
       struct {
         int zero;
         int one;
         int two;
         int three;
       } a;
       int b[4];
     } my_agg_t;

     // set value using initializer
     my_agg_t X = {
       .b[0] = 0,
       .a.one = 1,
       .b[2] = 2,
       .a.three = 3,
     };

int main(int argc, char *argv)
{

       // set value using series of assignments
       my_agg_t Y;
       Y.b[0] = 0;
       Y.a.one = 1;
       Y.b[2] = 2;
       Y.a.three = 3;

       printf("X:%d,%d,%d,%d\n", X.a.zero, X.a.one, X.a.two, X.a.three);
       printf("Y:%d,%d,%d,%d\n", Y.a.zero, Y.a.one, Y.a.two, Y.a.three);
     }

GCC Produces:

X:0,0,0,3
Y:0,1,2,3

I believe Goa and Philip Reames are arguing that it seems very unintuitive that while the source code for setting X and setting Y look very similar they produce very different results.

> I'm reasonably convinced that this is a hole in the C standard -- it doesn't specify what to do in this case -- and Eli's approach seems completely reasonable to me. Plus it has the benefit of > being compatible with at least two other major compilers (g++ and EDG), and doesn't require us to invent a new representation for initializer lists.

> I still think Eli's approach is the right one here.

I agree.

Does this require further discussion or can I submit a patch that implements this behavior?

Thanks,
Matthew

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

preames · September 17, 2013, 3:03pm

>
> Richard, Doug,
>
> Do either of you have an recommendation here?
>
> Eli recommends GCC's behavior, while Philip Reames and Gao would both prefer the alternate behavior below (same as a series of assignments).
>
> Can you explain how treating the designated initializers as a sequence of assignments could give the object the value {0, 1, 2, 3}?

Given the following program :

    #include <stdio.h>

    typedef union {
      struct {
        int zero;
        int one;
        int two;
        int three;
      } a;
      int b[4];
    } my_agg_t;

    // set value using initializer
    my_agg_t X = {
      .b[0] = 0,
      .a.one = 1,
      .b[2] = 2,
      .a.three = 3,
    };

    int main(int argc, char *argv)
    {

      // set value using series of assignments
      my_agg_t Y;
      Y.b[0] = 0;
      Y.a.one = 1;
      Y.b[2] = 2;
      Y.a.three = 3;

      printf("X:%d,%d,%d,%d\n", X.a.zero, X.a.one, X.a.two, X.a.three);
      printf("Y:%d,%d,%d,%d\n", Y.a.zero, Y.a.one, Y.a.two, Y.a.three);
    }

GCC Produces:

    X:0,0,0,3
    Y:0,1,2,3

I believe Goa and Philip Reames are arguing that it seems very unintuitive that while the source code for setting X and setting Y look very similar they produce very different results.

You summarized my argument well. Thanks.

> I'm reasonably convinced that this is a hole in the C standard -- it doesn't specify what to do in this case -- and Eli's approach seems completely reasonable to me. Plus it has the benefit of > being compatible with at least two other major compilers (g++ and EDG), and doesn't require us to invent a new representation for initializer lists.

> I still think Eli's approach is the right one here.

I agree.

Does this require further discussion or can I submit a patch that implements this behavior?

No further discussion needed. I am still concerned about the possible confusion, but agree the compatibility argument is a strong one as well. I will defer to the your choice.

Philip

Gao_Yunzhong1 · September 17, 2013, 8:49pm

Hi Mattew,
I am fine with Richard's and Eli's approach.
- Gao.

Topic		Replies	Views
-ftrivial-auto-var-init=pattern vs =uninitialized and union initialization Clang Frontend	8	116	September 28, 2019
Union initialization, and aliasing (clang 18 seems to miscompile musl?) Clang Frontend	11	408	March 21, 2024
[PATCH] Fix for bug 21725: wrong results with union and strict-aliasing Clang Frontend	26	110	March 20, 2015
Code generation for structure/union initializers... Clang Frontend	0	77	January 25, 2008
Static Analyzer "Uninitialized argument value checks for Unions" Clang Frontend	2	86	January 14, 2014

question about initializing multiple members of unions

Related topics