C++ String literals used as initializers are re-assigned incorrect array element type

Using clang trunk:

$ cat t.cpp
char ca[] = "text";

$ clang --version
clang version 3.5.0 (trunk 204467)
...

$ clang -c -Xclang -ast-dump t.cpp
...
`-VarDecl 0x7b0cb90 <t.cpp:1:1, col:13> ca 'char [5]'
   `-StringLiteral 0x7b0cc68 <col:13> 'char [5]' lvalue "text"

Note that the type of the string literal is 'char [5]'. I would expect 'const char[5]'.

I debugged a bit and the StringLiteral instance is initially created with a const qualified element type, but the const qualification is later removed by CheckStringInit() in lib/Sema/SemaInit.cpp. I suspect CheckStringInit() intends only to adjust the size of the string literal and not to change qualifiers on the element type.

Actually, while debugging this, I realized the issue goes beyond qualifiers. The element type of the string literal itself is changed:

$ cat t.cpp
typedef char my_char;
my_char ca[] = "text";

$ clang -c -Xclang -ast-dump t.cpp
...
`-VarDecl 0x4e61730 <line:2:1, col:16> ca 'my_char [5]'
   `-StringLiteral 0x4e61808 <col:16> 'my_char [5]' lvalue "text"

Note that the StringLiteral now has an array element type of 'my_char'.

Tom.

Using clang trunk:

$ cat t.cpp
char ca[] = "text";

$ clang --version
clang version 3.5.0 (trunk 204467)
...

$ clang -c -Xclang -ast-dump t.cpp
...
`-VarDecl 0x7b0cb90 <t.cpp:1:1, col:13> ca 'char [5]'
  `-StringLiteral 0x7b0cc68 <col:13> 'char [5]' lvalue "text"

Note that the type of the string literal is 'char [5]'. I would expect
'const char[5]'.

I debugged a bit and the StringLiteral instance is initially created with
a const qualified element type, but the const qualification is later
removed by CheckStringInit() in lib/Sema/SemaInit.cpp. I suspect
CheckStringInit() intends only to adjust the size of the string literal and
not to change qualifiers on the element type.

Actually, while debugging this, I realized the issue goes beyond
qualifiers. The element type of the string literal itself is changed:

$ cat t.cpp
typedef char my_char;
my_char ca[] = "text";

$ clang -c -Xclang -ast-dump t.cpp
...
`-VarDecl 0x4e61730 <line:2:1, col:16> ca 'my_char [5]'
  `-StringLiteral 0x4e61808 <col:16> 'my_char [5]' lvalue "text"

Note that the StringLiteral now has an array element type of 'my_char'.

This is intentional, though surprising. A string literal expression that's
used to initialize an array is quite a different beast from a string
literal expression that's used to create a string literal object. In the
initialization case, we're effectively initializing each element of the
array from the corresponding character in the string literal, so it makes
some degree of sense to model the string literal as having the same type as
the initialized entity. (If we didn't do that, then to maintain the AST
invariants we'd need to invent a new kind of implicit cast that switches
out the type.)

This is similar to how we handle the semantic form of an InitListExpr.
Consider:

typedef unsigned char my_char;
my_char arr[10] = { 't', 'e', 'x', 't', 0 };
my_char arr2[10] = "text";

This produces:

-VarDecl 0x6c4f0b0 <line:2:1, col:43> arr 'my_char [10]'
`-InitListExpr 0x6c4f1f0 <col:19, col:43> 'my_char [10]'
  >-ImplicitCastExpr 0x6c4f230 <col:21> 'my_char':'unsigned char'

<IntegralCast>

  > `-CharacterLiteral 0x6c4f108 <col:21> 'char' 116
  >-ImplicitCastExpr 0x6c4f250 <col:26> 'my_char':'unsigned char'

<IntegralCast>

  > `-CharacterLiteral 0x6c4f120 <col:26> 'char' 101
  >-ImplicitCastExpr 0x6c4f278 <col:31> 'my_char':'unsigned char'

<IntegralCast>

  > `-CharacterLiteral 0x6c4f138 <col:31> 'char' 120
  >-ImplicitCastExpr 0x6c4f2b0 <col:36> 'my_char':'unsigned char'

<IntegralCast>

  > `-CharacterLiteral 0x6c4f150 <col:36> 'char' 116
  `-ImplicitCastExpr 0x6c4f2c8 <col:41> 'my_char':'unsigned char'

<IntegralCast>

    `-IntegerLiteral 0x6c4f168 <col:41> 'int' 0

`-VarDecl 0x6c96540 <line:3:1, col:20> arr2 'my_char [10]'
  `-StringLiteral 0x6c96618 <col:20> 'my_char [10]' lvalue "text"