r161350 breaks the intention of ParentMap::addStmt

Hi @clang,

the comment to ParentMap::addStmt states, that the parent relation of a Stmt can be _updated_ by a call to it (and I use it that way). r161350 breaks this functionality. Someone should have a look at this since I don't know how to deal with PseudoObjectExprs.

Best regards
Olaf

Interesting, and sorry for the comment-incompatible update. Let me explain my rationale for that change.

OpaqueValueExpr is intended to represent a value that is computed once but used in multiple places; the simplest example I've seen is for BinaryConditionalOperator (a GNU extension):

int *x = y ?: z;

In this expression, the condition of the operator is an ImplicitCastExpr (PointerToBool) around a DeclRefExpr (y), and the true-result is just the DeclRefExpr.* We don't want to compute the same value twice because of side effects, and nor do we want to traverse the source expression twice with a RecursiveASTVisitor, so we use OpaqueValueExpr to represent the evaluation of 'y'. In this particular case, it's not so clear whether the "canonical" parent of the DeclRefExpr is the ImplicitCastExpr or the BinaryConditionalOperator.

* There is more going on here (LValueToRValue conversion, for one thing), but I'm doing this from memory and trying to keep it simple.

A PseudoObjectExpr is used when we want one expression to essentially rewrite to another one. The only use right now is for properties and object subscripting in Objective-C. For example, 'self.foo = bar' is rewritten to be '[self setFoo:bar]'. In this case we have the notion of a "syntactic" form and one or more "semantic" statements that are actually emitted by CodeGen.

Since the analyzer is using this ParentMap for diagnostic emission, we definitely want to make the canonical parent of 'bar' be the assignment operation that the user actually wrote, rather than the message send that will be executed. But we'd still like to be able to say that the message send is a child of the larger PseudoObjectExpr. As long as the semantic form comes first, the "don't update the map again" idea would be correct.

ANYWAY, after all that, it seems like a correct solution, if slightly more expensive, would be to keep track of which statements have been seen on this particular traversal. Another possibility is that there could be an explicit test for PseudoObjectExpr to make sure the semantic form is traversed last instead of first. Do you (or does anybody else) have opinions?

Jordan

Hi Jordan,

Interesting, and sorry for the comment-incompatible update. Let me explain my rationale for that change.

OpaqueValueExpr is intended to represent a value that is computed once but used in multiple places; the simplest example I've seen is for BinaryConditionalOperator (a GNU extension):

int *x = y ?: z;

In this expression, the condition of the operator is an ImplicitCastExpr (PointerToBool) around a DeclRefExpr (y), and the true-result is just the DeclRefExpr.* We don't want to compute the same value twice because of side effects, and nor do we want to traverse the source expression twice with a RecursiveASTVisitor, so we use OpaqueValueExpr to represent the evaluation of 'y'. In this particular case, it's not so clear whether the "canonical" parent of the DeclRefExpr is the ImplicitCastExpr or the BinaryConditionalOperator.

* There is more going on here (LValueToRValue conversion, for one thing), but I'm doing this from memory and trying to keep it simple.

I'm concerned about instances in the AST with mutliple parents. I know that clang's AST represents more than just a literal AST. However I wouldn't go so far and allow multiple parents in the AST. That said, the IMHO correct solution is to remove getCond and getTrueExpr from AbstractConditionalOperator and handle the LHS of BinaryConditionalOperator differently. But according to the comments I don't think that my problem stems from OpaqueValueExpr.

A PseudoObjectExpr is used when we want one expression to essentially rewrite to another one.

Hmm, I did a textual search for PseudoObjectExpr, but haven't found it anywhere but in the one comment in ParentMap.cpp. Looks as if it is gone again. Maybe in that case the first part (causing the troubles) of BuildParentMap can be reverted anyway(?)

Best regards
Olaf

Hi Jordan,

Interesting, and sorry for the comment-incompatible update. Let me explain my rationale for that change.

OpaqueValueExpr is intended to represent a value that is computed once but used in multiple places; the simplest example I’ve seen is for BinaryConditionalOperator (a GNU extension):

int *x = y ?: z;

In this expression, the condition of the operator is an ImplicitCastExpr (PointerToBool) around a DeclRefExpr (y), and the true-result is just the DeclRefExpr.* We don’t want to compute the same value twice because of side effects, and nor do we want to traverse the source expression twice with a RecursiveASTVisitor, so we use OpaqueValueExpr to represent the evaluation of ‘y’. In this particular case, it’s not so clear whether the “canonical” parent of the DeclRefExpr is the ImplicitCastExpr or the BinaryConditionalOperator.

  • There is more going on here (LValueToRValue conversion, for one thing), but I’m doing this from memory and trying to keep it simple.

I’m concerned about instances in the AST with mutliple parents. I know that clang’s AST represents more than just a literal AST. However I wouldn’t go so far and allow multiple parents in the AST. That said, the IMHO correct solution is to remove getCond and getTrueExpr from AbstractConditionalOperator and handle the LHS of BinaryConditionalOperator differently. But according to the comments I don’t think that my problem stems from OpaqueValueExpr.

“Expressions with multiple parents” is exactly the problem that OpaqueValueExpr is trying to solve. When you traverse the AST, the “source expr” of an OpaqueValueExpr is not visited – you have to explicitly request that. OpaqueValueExpr is a small way to do AST canonicalization without breaking invariants or losing source information.

A PseudoObjectExpr is used when we want one expression to essentially rewrite to another one.

Hmm, I did a textual search for PseudoObjectExpr, but haven’t found it anywhere but in the one comment in ParentMap.cpp. Looks as if it is gone again. Maybe in that case the first part (causing the troubles) of BuildParentMap can be reverted anyway(?)

No, no. An Objective-C property reference gets rebuilt as a PseudoObjectExpr in Sema. The analyzer doesn’t have to do anything special to handle it because the CFG builder knows to just use the “semantic” representation, but when we go to report diagnostics, we’d like to know if a given Objective-C message was written as a message send or a property/subscript access. The specific case of ParentMap is to ensure that Xcode’s arrows draw correctly even when highlighting something inside a PseudoObjectExpr—if you revert the change, you’ll notice retain-release-path-notes.m starts failing in FileCheck because the plist serialization doesn’t match.

Perhaps rather than just looking through OpaqueValueExprs, we should just explicitly handle their use cases in ParentMap?

  • BinaryConditionalOperator

  • PseudoObjectExpr, currently used for ObjCPropertyRefExpr and ObjCSubscriptRefExpr

  • C++ ‘catch’ variables, as a placeholder (Sema::BuildExceptionDeclaration) ← I’d be okay with not worrying about this one, since it just contains a DeclRefExpr back to the variable to make it look initialized.

What do you think?
Jordan

A PseudoObjectExpr is used when we want one expression to essentially
rewrite to another one.

Hmm, I did a textual search for PseudoObjectExpr, but haven't found it
anywhere but in the one comment in ParentMap.cpp. Looks as if it is
gone again. Maybe in that case the first part (causing the troubles)
of BuildParentMap can be reverted anyway(?)

No, no.

Haha, I copy-pasted "PseudoObjectExprs" from the comment and searched for that. No wonder that I didn't found anything with the trailing 's'.

Perhaps rather than just looking through OpaqueValueExprs, we should
just explicitly handle their use cases in ParentMap?
- BinaryConditionalOperator
- PseudoObjectExpr, currently used for ObjCPropertyRefExpr and
ObjCSubscriptRefExpr
- C++ 'catch' variables, as a placeholder
(Sema::BuildExceptionDeclaration) <-- I'd be okay with not worrying
about this one, since it just contains a DeclRefExpr back to the
variable to make it look initialized.

What do you think?

Even if I don't like the current design of OpaqueValueExprs (assuming that I've understood correctly, that a source expr can have multiple OpaqueValueExpr parents) I'm completely OK with the handling of it just because it doesn't bother me.
For PseudoObjectExpr we could just look whether the parent is a PseudoObjectExpr (see the patch) and do a special handling in that case. However I don't understand while it is harmful if the semantic children have their PseudoObjectExpr as parent.

Best regards
Olaf

parentmap.patch (899 Bytes)

The first patch was wrong, when it comes to an update of a PseudoObjectExpr. I have to test S instead of Parent. Overall that simplified the code.

Best Olaf

parentmap.patch (1 KB)

Sorry, it isn't PseudoObjectExpr that's the problem; it's OpaqueValueExpr. (I just realized that now, meaning my comment is wrong.) The source expr of an OpaqueValueExpr isn't directly accessible as a child to avoid multiple traversal, but the rest of that commit taught ParentMap how to look through an OpaqueValueExpr. That probably isn't the right thing to do.

To illustrate the problem, here are the pieces inside 'self.property = value' (roughly):

0: $1.property = $2
1: 'self'
2: 'value'
3: [$1 setProperty:$2]

$1 and $2 represent OpaqueValueExprs that refer back to the real expressions in child slots 1 and 2. Child slot 0 is the "syntactic" form and is not used by CodeGen or the analysis engine, but it /is/ used for diagnostics. For the analyzer to print the best diagnostics, we want the parent expr of 'value' to be the assignment expression (0), and the parent expr of 'self' to be the ObjCPropertyRefExpr /inside/ the assignment ($1.property). Having the parent be the entire PseudoObjectExpr is possible but less than ideal.

I'm convincing myself more and more that this just means we should handle the /users/ of OpaqueValueExprs specially rather than trying to write a magic bullet solution that handles all OpaqueValueExprs. (For the BinaryConditionalOperator, should the parent of the LHS be the implicit cast or the entire operator?)

Sorry to keep drawing this out,
Jordan

Perhaps rather than just looking through OpaqueValueExprs, we should
just explicitly handle their use cases in ParentMap?
- BinaryConditionalOperator
- PseudoObjectExpr, currently used for ObjCPropertyRefExpr and
ObjCSubscriptRefExpr
- C++ 'catch' variables, as a placeholder
(Sema::BuildExceptionDeclaration) <-- I'd be okay with not worrying
about this one, since it just contains a DeclRefExpr back to the
variable to make it look initialized.

What do you think?

Even if I don't like the current design of OpaqueValueExprs (assuming that I've understood correctly, that a source expr can have multiple OpaqueValueExpr parents) I'm completely OK with the handling of it just because it doesn't bother me.
For PseudoObjectExpr we could just look whether the parent is a PseudoObjectExpr (see the patch) and do a special handling in that case. However I don't understand while it is harmful if the semantic children have their PseudoObjectExpr as parent.

Sorry, it isn't PseudoObjectExpr that's the problem; it's OpaqueValueExpr. (I just realized that now, meaning my comment is wrong.) The source expr of an OpaqueValueExpr isn't directly accessible as a child to avoid multiple traversal, but the rest of that commit taught ParentMap how to look through an OpaqueValueExpr. That probably isn't the right thing to do.

To illustrate the problem, here are the pieces inside 'self.property = value' (roughly):

0: $1.property = $2
1: 'self'
2: 'value'
3: [$1 setProperty:$2]

$1 and $2 represent OpaqueValueExprs that refer back to the real expressions in child slots 1 and 2. Child slot 0 is the "syntactic" form and is not used by CodeGen or the analysis engine, but it /is/ used for diagnostics. For the analyzer to print the best diagnostics, we want the parent expr of 'value' to be the assignment expression (0), and the parent expr of 'self' to be the ObjCPropertyRefExpr /inside/ the assignment ($1.property). Having the parent be the entire PseudoObjectExpr is possible but less than ideal.

Is there a reason why the syntactic form is split up like that? Is it
possible to have only two child expressions for PseudoObjectExpr,
syntactic and semantic, where the syntactic form doesn't have any
OpaqueValueExprs, and the OpaqueValueExprs in the semantic form point
within the syntactic form. If this was possible, then the ParentMap
wouldn't need any special handling for OpaqueValueExpr at all. (I'm
not familiar with codegen or the use of PseudoObjectExpr, so this
sugggestion may be way off.)

I'm not 100% sure, but I believe it's because this way the CodeGen for a PseudoObjectExpr just runs through the semantic exprs, and never has to look inside any OpaqueValueExprs. By the time an OpaqueValueExpr is reached in the CodeGen traversal, it is guaranteed to have already been evaluated. The analyzer does the same thing.

I believe John is the one who designed PseudoObjectExpr, so maybe he has something to add here, but I think I got the gist of it.

Philip's proposal would no doubt help some tools that cared only about
the syntactic form. Unfortunately, it would make serialization and
deserialization much more complicated, because it relies on expression
pointers matching exactly at arbitrary points in the expression hierarchy.
It would also complicate tools that wish to understand what parts of the
syntactic form are actually being evaluated.

I have heard a lot of complaints from people who find that OVEs
complicate their code. In pretty much every instance, their proposal
for making OVEs less invasive would actually have left their tool
doing the wrong thing — at best double-traversing expressions,
and often something more insidious.

OVEs model complex language features. Where the language is
less complex, we don't use them. Where we do use them, you
almost certainly need to think more carefully about the right thing to do.

John.

Well thats OK as long as this doesn't break code, if OVEs are not used. However in the particular case of parent map a rather basic piece of the AST is influenced by them. Thus I'd rather suggest to fix that.
The fundamental flaw of the design is the unclear parent-child relation introduced by OVEs. You see, an OVE has no (implict) child. Thus, shall the OVE be a parent of the source expression of the OVE? IMHO not.

Said that I propose a revert of r161350 as well as r150894 in ParentMap. There must be another fix of <rdar://problem/10797980> (which I admittedly don't know).

To recap the example given by Jordan: currently the PseudoObjectExpr is as follows:

0: $1_1.property = $2_1
1: 'self'
2: 'value'
3: [$1_2 setProperty:$2_2]

$X_X are the OVEs. There are 4 OVEs and each OVE points to either self or value. Couldn't we reorder it in the following way:

0: self.property = value
1: $1_1
2: $2_1
3: [$1_2 setProperty:$2_2]

? First this would solve the issues in parent map since the syntactic form now has no OVEs in it (and we do not need to treat OVEs at all in parent map). Second, whenever a PseudoObjectExpr must be treated specifically, also the OVEs in it can be handled specifically.

Best Olaf

This makes things harder for CodeGen (and for the analyzer evaluation). Remember, currently CodeGen is able to make these assumptions:

(1) When you see an OpaqueValueExpr, its source expr has already been evaluated.
(2) To evaluate a PseudoObjectExpr, just evaluate each of its semantic exprs in order.

Changing the OVEs in the POE to refer into the ObjCPropertyRefExpr violates this, and makes $1_1 and $2_1 kind of useless.

I see what you mean about the problem with ParentMap, but the idea is that asking "what is the parent of this expression with multiple parents?" is not a good question. For POEs, the syntactic expression is probably the best choice. For BinaryConditionalOperator, there's a choice between the implicit cast-to-bool in the "condition", and the BCO itself for the "result".

Do you agree here? In your use case, what would the right parents be?
Jordan

Hi Jordan,

first, thanks for r164947. Now lets hope that we get the ParentMap case resolved.

(1) When you see an OpaqueValueExpr, its source expr has already been evaluated.
(2) To evaluate a PseudoObjectExpr, just evaluate each of its semantic exprs in order.

OK, it looks as if we won't get to change this. So let us just concentrate on the parent map implementation.
I hope I've understood it now, but it looks to me that the problem stems from the last lines in addStmt, where OpaqueValueExpr::sourceExpr is treated like a child. Thus, the second test in the actual children loop should only fire, if the parent is already an OpaqueValueExpr.
What do you think of the attached simple patch? It resolves my problems (I have no OVEs in my code) and IMHO it also properly handles the update of OVEs (since these are updated unconditionally in the last three lines).
I have not tested the appropriate regressions but I have the strong feeling that they will work.

Best Olaf

parentmap.patch (497 Bytes)

Hi, Olaf. I've gone ahead and just dealt with PseudoObjectExpr and BinaryConditionalOperator explicitly in r165355. I was imagining the case where some day you or someone else is using ParentMap the way you're using it -- to keep the parents of a tree of statements up-to-date, as is specifically mentioned in the header comment -- and would need it to work correctly even for OpaqueValueExprs. I've removed all checks for an existing parent, and while things look quite a bit more complicated, I don't think it will actually change the performance characteristics in any significant way.

Essentially, OpaqueValueExprs are now "transparent" by default, but PseudoObjectExpr and BinaryConditionalOperator deliberately switch to an "opaque" mode when handling their semantic sub-expressions.

Let me know if you're still having problems, and sorry about messing this up in the first place.
Jordan