Reuse of expressions in synthesized functions.

Greetings,

I have encountered a bug in the static analyzer <http://llvm.org/bugs/show_bug.cgi?id=16745>, which I have traced back to a problem with a synthesized operator=(). In short, the analyzer currently assumes that each expression object occurs in a function only once, and this breaks down for synthesized functions, as Sema happily reuses various Exprs there.

So, now the question is how to proceed. One possible solution (one that Jordan suggests) would be to have Sema not reuse any Exprs during synthesis of various functions. This should be fairly easy, but it will touch a larger amount of code (as the synthesis algorithm is recursive, I would need to pass some sort of expression factory objects instead of plain Expr *s back and forth). It would also make the code look less pretty and cause (probably only slight) memory increase.

Because of these downsides, I am not sure if a change like this would be accepted into the tree. Therefore, I wanted to ask you what do you think about this change before I invest time into carrying it out. Would you accept a patch like this? Or should i try to fix the problem from the analyzer side (right now I think it’s doable, but it’s quite possible i’ll run into unexpected troubles)? Also, can you think of any other situations where sharing expression objects could cause troubles (template instantiations have been mentioned, but these should be fine as they don’t produce nodes in the same function)?

regards,
pavel

I think it’s fine to teach Sema to build unique expressions within its synthesized assignment operators. It’s a cleaner model for the AST anyway.

  • Doug

Is this *in general* a viable approach?

I'm thinking to repeated initializer in InitListExpr (as long as we need
to put in AST the expanded nodes).

>
>
>> Greetings,
>>
>> I have encountered a bug in the static analyzer
>> <http://llvm.org/bugs/show_bug.cgi?id=16745>, which I have traced back
>> to a problem with a synthesized operator=(). In short, the analyzer
>> currently assumes that each expression object occurs in a function
>> only once, and this breaks down for synthesized functions, as Sema
>> happily reuses various Exprs there.
>>
>> So, now the question is how to proceed. One possible solution (one
>> that Jordan suggests) would be to have Sema not reuse any Exprs during
>> synthesis of various functions. This should be fairly easy, but it
>> will touch a larger amount of code (as the synthesis algorithm is
>> recursive, I would need to pass some sort of expression factory
>> objects instead of plain Expr *s back and forth). It would also make
>> the code look less pretty and cause (probably only slight) memory
>> increase.
>>
>> Because of these downsides, I am not sure if a change like this would
>> be accepted into the tree. Therefore, I wanted to ask you what do you
>> think about this change before I invest time into carrying it out.
>> Would you accept a patch like this? Or should i try to fix the problem
>> from the analyzer side (right now I think it's doable, but it's quite
>> possible i'll run into unexpected troubles)? Also, can you think of
>> any other situations where sharing expression objects could cause
>> troubles (template instantiations have been mentioned, but these
>> should be fine as they don't produce nodes in the same function)?
>
> I think it’s fine to teach Sema to build unique expressions within its
> synthesized assignment operators. It’s a cleaner model for the AST
anyway.

Ok, I'll get on it then.

Is this *in general* a viable approach?

I'm thinking to repeated initializer in InitListExpr (as long as we need
to put in AST the expanded nodes).

Could you please elaborate on that? I'm not sure what you mean by repeated
initializers. Do you know of a situation where an AST node can be present
in the tree twice?

pl

    >
    >
    >> Greetings,
    >>
    >> I have encountered a bug in the static analyzer
    >> <http://llvm.org/bugs/show_bug.cgi?id=16745>, which I have traced
    back
    >> to a problem with a synthesized operator=(). In short, the analyzer
    >> currently assumes that each expression object occurs in a function
    >> only once, and this breaks down for synthesized functions, as Sema
    >> happily reuses various Exprs there.
    >>
    >> So, now the question is how to proceed. One possible solution (one
    >> that Jordan suggests) would be to have Sema not reuse any Exprs
    during
    >> synthesis of various functions. This should be fairly easy, but it
    >> will touch a larger amount of code (as the synthesis algorithm is
    >> recursive, I would need to pass some sort of expression factory
    >> objects instead of plain Expr *s back and forth). It would also make
    >> the code look less pretty and cause (probably only slight) memory
    >> increase.
    >>
    >> Because of these downsides, I am not sure if a change like this would
    >> be accepted into the tree. Therefore, I wanted to ask you what do you
    >> think about this change before I invest time into carrying it out.
    >> Would you accept a patch like this? Or should i try to fix the
    problem
    >> from the analyzer side (right now I think it's doable, but it's quite
    >> possible i'll run into unexpected troubles)? Also, can you think of
    >> any other situations where sharing expression objects could cause
    >> troubles (template instantiations have been mentioned, but these
    >> should be fine as they don't produce nodes in the same function)?
    >
    > I think it’s fine to teach Sema to build unique expressions within its
    > synthesized assignment operators. It’s a cleaner model for the AST
    anyway.

Ok, I'll get on it then.

    Is this *in general* a viable approach?

    I'm thinking to repeated initializer in InitListExpr (as long as we need
    to put in AST the expanded nodes).

Could you please elaborate on that? I'm not sure what you mean by
repeated initializers. Do you know of a situation where an AST node can
be present in the tree twice?

$ cat p.c
void f() {
  int a[10] = {[1 ... 8] = 2};
}
$ clang -cc1 -ast-dump p.c
TranslationUnitDecl 0x54ae6e0 <<invalid sloc>>

-TypedefDecl 0x54aebc0 <<invalid sloc>> __int128_t '__int128'
-TypedefDecl 0x54aec20 <<invalid sloc>> __uint128_t 'unsigned __int128'
-TypedefDecl 0x54aef70 <<invalid sloc>> __builtin_va_list

'__va_list_tag [1]'
`-FunctionDecl 0x54af010 <p.c:1:1, line:3:1> f 'void ()'
  `-CompoundStmt 0x54af330 <line:1:10, line:3:1>
    `-DeclStmt 0x54af318 <line:2:3, col:30>
      `-VarDecl 0x54af130 <col:3, col:29> a 'int [10]'
        `-InitListExpr 0x54af280 <col:15, col:29> 'int [10]'
          >-ImplicitValueInitExpr 0x54af308 <<invalid sloc>> 'int'
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          `-IntegerLiteral 0x54af1c8 <col:28> 'int' 2

Soon or later I personally hope that such node replication will
disappear, but currently we have it and I don't know if there is the
intention to get rid of it.

>
>
>> Could you please elaborate on that? I'm not sure what you mean by
> repeated initializers. Do you know of a situation where an AST node can
> be present in the tree twice?
>
$ cat p.c
void f() {
  int a[10] = {[1 ... 8] = 2};
}
$ clang -cc1 -ast-dump p.c
TranslationUnitDecl 0x54ae6e0 <<invalid sloc>>
>-TypedefDecl 0x54aebc0 <<invalid sloc>> __int128_t '__int128'
>-TypedefDecl 0x54aec20 <<invalid sloc>> __uint128_t 'unsigned __int128'
>-TypedefDecl 0x54aef70 <<invalid sloc>> __builtin_va_list
'__va_list_tag [1]'
`-FunctionDecl 0x54af010 <p.c:1:1, line:3:1> f 'void ()'
  `-CompoundStmt 0x54af330 <line:1:10, line:3:1>
    `-DeclStmt 0x54af318 <line:2:3, col:30>
      `-VarDecl 0x54af130 <col:3, col:29> a 'int [10]'
        `-InitListExpr 0x54af280 <col:15, col:29> 'int [10]'
          >-ImplicitValueInitExpr 0x54af308 <<invalid sloc>> 'int'
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          >-IntegerLiteral 0x54af1c8 <col:28> 'int' 2
          `-IntegerLiteral 0x54af1c8 <col:28> 'int' 2

Wow.. I didn't even know such syntax exists. This is a nice example, thanks.

Right now, the static analyzer seems to get this code right (which is quite
surprising, since even the compiler does not support this right now), but
as soon as I replace 2 by something more complicated (a function call), it
stops for some reason, which I didn't investigate yet. But in any case, the
analyzers approach to this is wrong, as it tries to evaluate the expression
multiple times, which, I guess, it shouldn't (gcc doesn't).

Since the compiler doesn't support this yet, I guess this is not such an
important issue yet, but it's definitely a thing to keep an eye on.

Soon or later I personally hope that such node replication will

disappear, but currently we have it and I don't know if there is the
intention to get rid of it.

I think I would also prefer a different AST representation of the above
code, because the current one gives the impression that the RHS expression
is to be evaluated multiple times.

cheers,
pavel

Yeah, we'll need a different AST representation to implement it properly...

That said, you might also want to watch out for something like the
following:

struct A { A(); };
void f() { A a[10]; }

-Eli

It’s a GNU extension that isn’t used all that often.

Right. This is a non-trivial refactoring of the AST to introduce either some kind of run-length encoding for repeated expressions or additional uses of OpaqueValueExprs.

  • Doug