InitListExpr::getSyntacticForm and C++

I've just verified that in:

class C {
public:
  C(int a);
};

C v[3] = {1,2,3.0};

the InitListExpr returned by InitListExpr::getSyntacticForm misses all
the fundamental info about implicit casts and CXXConstructExpr.

Is this deliberate (why?) or it is a bug to fix?

Does the syntactic form normally have any of this stuff? I thought it was just the initializer list as it was written in the source, without further analysis or comment.

John.

As you can see from the typescript above, implicit casts are where they
are expected, but AST is without CXXConstructExpr and implicit cast when
initialized type is a class.

$ cat t.c

int v[3] = {1,2,3.0};
$ clang -cc1 -ast-dump t.c
typedef char *__builtin_va_list;
int v[3] = (InitListExpr 0xaed1f90 <t.c:2:12, col:20> 'int [3]'
  (IntegerLiteral 0xaed1ef0 <col:13> 'int' 1)
  (IntegerLiteral 0xaed1f10 <col:15> 'int' 2)
  (ImplicitCastExpr 0xaed1fd0 <col:17> 'int' <FloatingToIntegral>
    (FloatingLiteral 0xaed1f30 <col:17> 'double' 3.000000e+00)))
;

I think that's more an accident than anything else.

John.

I'm not sure to read you right: are you saying that you believe it is an
accident the lack of CXXConstructExpr/ImplicitCastExpr for class typed
initializer or the presence of implicit casts for initializer with
builtin types?

The only reason that the syntactic InitListExpr has implicit casts in it is
that we try to reuse the syntactic ILE as the semantic ILE. You should
not count on the syntactic ILE ever having such casts.

John.

This lead to problems implementing important services on clang...

The clang AST basic design is to have a data structure that, together
with complete syntactical info, stores also rich info about what is
happening "under the hood" (implicit cast, shadow decl, cxx construct
expr, implicit this, implicit member expr, etc.).

As far as I can see this is always true (despite a single understandable
notable exception being absence of destructor implicit invocation) and
recent wonderful adding of lvalue to rvalue casts is a further step in
this very same path.

The effect to not follow this design pattern for syntactic list expr is
the impossibility to implement things like to count the syntactically
present constructor invocation (needed for Halstead metrics, the student
job that originated my email).

Take as an example this code:

class C {
public:
  C();
  C(int a);
};

int a = 3;
C v[10] = {1, 2, [2 ... 5] = 3.0};

If we look into semantic form we will find 4 constructor invocation for
3.0 instead of 1, OTOH if we look into syntactic form we will not find
the constructor invocation.

This is only an example of what would become impossible to do, but I
hope that the general point is clear: I think that to have init list
expr syntactic form that not follow general design patterns is a bad
thing (think to have an init list expr of type int[4] that have as
elements uncasted float literals, etc.)

Absolutely. This generally works because the "under the hood" behavior
can be treated as extra information layered on top of the syntax tree.
Unfortunately, this is not necessarily true of initializer lists, where the semantic
form can be structured and ordered quite differently from the syntactic form.
Therefore we fall back to having two different views of the information, one of
which isn't always faithful to the syntax and one of which isn't always faithful
to the semantics.

I'm sorry if you've been skating by so far on the assumption that the syntactic
form will always contain all the relevant semantic information. If you have an
actual proposal for how to integrate these views, I think that would be interesting.

John.

I'd think that simply to add needed implicit nodes around the
initializer expr while still in syntactic form and then reuse them in
the semantic form is the perfect solution.

Not only this solves the problem above, but also it is far more time and
memory efficient than current implementation.

If you pass to

clang -cc1 -ast-dump t.cc

the following source

class C {
public:
  C();
  C(int a);
};

C v[10] = {1, 2, [2 ... 5] = 3.0};

you'll see that the generated identical nodes are replicated many times
without any sharing.

Too trivial to be true? I'm missing something, am not I?

Eliminated the replicated nodes would be wonderful; the replication is a holdover from the days when we had a strict ownership model of expressions with no notion of sharing.

  - Doug