AST for expresions in array sizes


#define MAX 10

int main() // (int n)
char arr[MAX * 3]; // arr[n + 3]

I expected that AST would be more detailed (e.g. Binary Operator) but Clang seems to fold it:
-FunctionDecl <line:11:1, line:14:1> line:11:5 main 'int ()' -CompoundStmt <line:12:1, line:14:1>
-DeclStmt <line:13:3, col:20> -VarDecl <col:3, col:19> col:8 arr ‘char [30]’

or raw output

`-VarDecl <col:3, col:15> col:8 a ‘char [n * 3]’

So with the current AST output we are unable to determine overflowing in array sizes, as requested in, right?

Is it possible to disable that folding / enhance VarDecl for arrays?

This is I think because the expression is in truth, not an expression
because part of it comes from the preprocessor. So at parse time,
Clang I think sees "10 + 3" there, not "MACRO + 3". Without going back
to the preprocessor and retrieving the state or tokens from it, you
won't be able to grab this extra information. (Although I'm not sure
how the compiler warning notes do this...)

If you don't use a preprocessor macro but rather a real expression,
the AST contains the expression verbatim:

    >-DeclStmt 0x1b31c38 <line:8:3, col:18>
    > `-VarDecl 0x1b31bd8 <col:3, col:17> col:8 used tmp 'char [x + 3]'
    >-BinaryOperator 0x1b31d10 <line:10:3, col:12> 'char' lvalue '='
    > >-ArraySubscriptExpr 0x1b31cb0 <col:3, col:8> 'char' lvalue
    > > >-ImplicitCastExpr 0x1b31c98 <col:3> 'char *' <ArrayToPointerDecay>
    > > > `-DeclRefExpr 0x1b31c50 <col:3> 'char [x + 3]' lvalue Var
0x1b31bd8 'tmp' 'char [x + 3]'
    > > `-IntegerLiteral 0x1b31c78 <col:7> 'int' 0
    > `-ImplicitCastExpr 0x1b31cf8 <col:12> 'char' <IntegralCast>
    > `-IntegerLiteral 0x1b31cd8 <col:12> 'int' 1

Or for a more whacky one:

    >-DeclStmt 0x11bdc28 <line:8:3, col:41>
    > `-VarDecl 0x11bdbc8 <col:3, col:40> col:8 used tmp 'char [x * 2
+ 5 - 1 / 2 * x * x + 42]'

This is still not a "BinaryOperator" but perhaps somehow the type
could be fetched out from this and then the inner expression
generated. It could be that only the dumper function is "lazy" about

Basically macro info (e.g. whether MAX or 10) is not needed, but info about BinaryOperator would be quite useful.

In your example variable-length-arrays (VariableArrayType) are used, which are a separate sub-class of ArrayType; it's a pretty rare feature. I guess Dávid is more curious about constant-size arrays (ConstantArrayType), which indeed do not store the size expression, and should not, because arrays of the same numeric size must also be of the same type (eg., for the purpose of template instantiations; VLAs, on the other hand, are forbidden in C++, probably for that very reason).

I also don't think it's a preprocessor thing to do. I don't think preprocessor collapses a[10 + 3] into a[13], because it definitely doesn't collapse 10 + 3 to 13.

If anywhere, these constant array size expressions should live somewhere in VarDecls.

I chatted with Richard Smith about this and he pointed out that the extra info for MAX * 3 is stored in the TypeLocInfo (which can be retrieved from the VarDecl), rather than the Type itself.

For example, in gdb, once I’ve found the VarDecl (pointer stored in the GDB temporary expression $10), I could retrieve the expression:

p ((clang::ArrayTypeLoc)((VarDecl*)$10)->getTypeSourceInfo()->getTypeLoc()).getSizeExpr()->dump()
BinaryOperator 0xcc8f7c8 ‘int’ ‘*’

-IntegerLiteral 0xcc8f788 ‘int’ 10
`-IntegerLiteral 0xcc8f7a8 ‘int’ 3

You can find the macro details by looking at the source location stuff - I don’t know that piece in detail, but should work as well/in the same way here as in the rest of the AST.

Hope that helps!

Great! Thanks

