RFC: What does this matcher mean to you?

We're trying to determine the semantics of the proposed new 'equalsBoundNode()' matcher. Before I present the aforementioned matcher expression let me illustrate what this new matcher does. Given that you've bound a node earlier in a matcher expression with a name "X", equalsBoundNode("X") allows you to refer back to that bound node and test for identity. For example, here's a matcher finding VarDecl's where the type matches the initializer type:

varDecl(hasType(qualType().bind("type")),
               hasInitializer(ignoringParenImpCasts(hasType(qualType(equalsBoundNode("type"))))))));

Semantics get more interesting where forEach*() matchers are concerned. So this is where I ask: What would you expect the following matcher to do? How would you expect it to behave?

    functionDecl(
      forEachDescendant(
        ifStmt(
          forEachDescendant(
            varDecl(hasType(qualType().bind("declType")))
          )
        )
      ),
      has(compoundStmt(has(
        callExpr(
          callee(
            functionDecl(
              returns(qualType(equalsBoundNode("declType")))
            )
          )
        )
      )))
    );

I would expect it the equalsBoundNode(“declType”) to either:

  • raise an error (assertion failure or exception) or
  • check all nodes bound to “declType”, and return true if any of them is equal

Perhaps different variants could be introduced with both behaviors? For example, equalsBoundNode(string) would only work if there is exactly one node bound to the specified name, while equalsAnyBoundNode(string) would look for equality in any node bound to the specified name.

An assertion instead of just not matching anything?

I don’t feel that it would be intuitive. I can see the logic behind the behavior you’re describing, but I still feel that an assertion would be the way to go.
(But of course this is very subjective.)

For clarification: the problem is less “whether it matches”, but for which matches you get callbacks with what bound nodes.

The forEachDescendant(… .bind(“declType”)) part in the matcher on its own will result in a callback for each found declType child, with the node bound to “declType”.

The question is: given that there are multiple matches for forEachDescendant, what is the behavior of the has(… equalsBoundNode(“declType”)) part:
How many callbacks with which bound nodes do you expect?

Cheers,
/Manuel

To help stimulate discussion, let me present the two ways this matcher has been interpreted:

1) An "all or nothing" matcher. equalsBoundNode("X") returns true if there exists any previously bound node called "X" and that node satisfies an identity relationship with the node passed to equalsBoundNode. If no such node exists, the matcher just returns false and the match fails, much like any other matcher.
2) As a filter. Consider the matches found by forEach*() as the set of matches. Then the has() matcher containing equalsBoundNode() is used as a filter over this set. Whenever equalsBoundNode() locates a previously bound node with the correct name and satisfying the identity condition, it's really referring to a node in the match set. If the has() matcher fails, the node gets removed from the match set.

The difference can be seen by looking at what matches get returned from either interpretation. With #1, as long as a functionDecl has at least one call statement to a function returning a type that matches any of the varDecls found within ifStmt's, the function will match and the set of bound nodes will be *every* varDecl found within ifStmts.

With #2, the matcher will match (When? Same as #1? Only if the set of bound nodes is non-empty?) and the set of bound nodes will be only those varDecls whose type matches the return type of a function called somewhere in the function.

I have my thoughts on both but don't want to influence anybody before hearing opinions first.

I think this question is basically not about equalsBoundNodes() at all.

The questions that needs answering are:

  • What should stmt(forEachDescendant(varDecl().bind(“var”)), forEachDescendant(declRefExpr().bind(“decl”))) do?
  • What should stmt(forEachDescendant(varDecl().bind(“var”)), has(declRefExpr().bind(“decl”))) do?

I think the first should call the callback for each pair (VarDecl, DeclRefExpr) under each Stmt. And the second should call the callback once for each VarDecl under a Stmt that has a DeclRefExpr-child.

The matchers themselves match, if the callback was invoked at least once.

I think we need to answer these questions first and then the behavior of equalsBoundNodes() will follow naturally.

From: Daniel Jasper [mailto:djasper@google.com]
Sent: Tuesday, June 04, 2013 5:22 AM
To: Vane, Edwin
Cc: Manuel Klimek; Gábor Kozár; Clang Dev List (cfe-dev@cs.uiuc.edu)
Subject: Re: [cfe-dev] RFC: What does this matcher mean to you?

I think this question is basically not about equalsBoundNodes() at all.

The questions that needs answering are:
- What should stmt(forEachDescendant(varDecl().bind("var")),
forEachDescendant(declRefExpr().bind("decl"))) do?
- What should stmt(forEachDescendant(varDecl().bind("var")),
has(declRefExpr().bind("decl"))) do?

I think the first should call the callback for each pair (VarDecl, DeclRefExpr)
under each Stmt. And the second should call the callback once for each VarDecl
under a Stmt that has a DeclRefExpr-child.

I agree. This is in line with the current definition and behaviour of the involved matchers. I interpret the 'filtering' behaviour as something new that, if we want to implement it, needs a different matcher to expose. I think the equalsBoundNode() matcher is independent of the filtering behaviour.

> From: Daniel Jasper [mailto:djasper@google.com]
> Sent: Tuesday, June 04, 2013 5:22 AM
> To: Vane, Edwin
> Cc: Manuel Klimek; Gábor Kozár; Clang Dev List (cfe-dev@cs.uiuc.edu)
> Subject: Re: [cfe-dev] RFC: What does this matcher mean to you?
>
> I think this question is basically not about equalsBoundNodes() at all.
>
> The questions that needs answering are:
> - What should stmt(forEachDescendant(varDecl().bind("var")),
> forEachDescendant(declRefExpr().bind("decl"))) do?
> - What should stmt(forEachDescendant(varDecl().bind("var")),
> has(declRefExpr().bind("decl"))) do?
>
> I think the first should call the callback for each pair (VarDecl,
DeclRefExpr)
> under each Stmt. And the second should call the callback once for each
VarDecl
> under a Stmt that has a DeclRefExpr-child.

I agree. This is in line with the current definition and behaviour of the
involved matchers. I interpret the 'filtering' behaviour as something new
that, if we want to implement it, needs a different matcher to expose. I
think the equalsBoundNode() matcher is independent of the filtering
behaviour.

I think with the proposed change to how we match, the equalsBoundNode would
naturally fit in as filtering matcher...

What do you mean by proposed change to *how* we match? The equalsBoundNode() matcher was meant to be just another matcher. If we want to use it for filtering I still think we need a different matcher to expose that functionality. Otherwise we're changing the semantics of existing matchers.

So given we seem to agree on how Daniel's two matchers behave, how does this help us with equalsBoundNode()?

> From: Manuel Klimek [mailto:klimek@google.com]
> Sent: Wednesday, June 05, 2013 12:24 PM
> To: Vane, Edwin
> Cc: Daniel Jasper; Gábor Kozár; Clang Dev List (cfe-dev@cs.uiuc.edu)
> Subject: Re: [cfe-dev] RFC: What does this matcher mean to you?

>
>
>
>
> > From: Daniel Jasper [mailto:djasper@google.com]
> > Sent: Tuesday, June 04, 2013 5:22 AM
> > To: Vane, Edwin
> > Cc: Manuel Klimek; Gábor Kozár; Clang Dev List (
cfe-dev@cs.uiuc.edu)
> > Subject: Re: [cfe-dev] RFC: What does this matcher mean to you?
> >
>
> > I think this question is basically not about equalsBoundNodes()
at all.
> >
> > The questions that needs answering are:
> > - What should stmt(forEachDescendant(varDecl().bind("var")),
> > forEachDescendant(declRefExpr().bind("decl"))) do?
> > - What should stmt(forEachDescendant(varDecl().bind("var")),
> > has(declRefExpr().bind("decl"))) do?
> >
> > I think the first should call the callback for each pair
(VarDecl,
> DeclRefExpr)
> > under each Stmt. And the second should call the callback once for
> each VarDecl
> > under a Stmt that has a DeclRefExpr-child.
>
>
> I agree. This is in line with the current definition and behaviour
of the
> involved matchers. I interpret the 'filtering' behaviour as something
new that, if
> we want to implement it, needs a different matcher to expose. I think the
> equalsBoundNode() matcher is independent of the filtering behaviour.
>
>
>
> I think with the proposed change to how we match, the equalsBoundNode
> would naturally fit in as filtering matcher...

What do you mean by proposed change to *how* we match? The
equalsBoundNode() matcher was meant to be just another matcher. If we want
to use it for filtering I still think we need a different matcher to expose
that functionality. Otherwise we're changing the semantics of existing
matchers.

So given we seem to agree on how Daniel's two matchers behave, how does
this help us with equalsBoundNode()?

To me equalsBoundNode is not just another matcher - many details on how
bound nodes are handled get exposed with equalsBoundNode, because the
relevant information was not accessible at those points before.

For example, before equalsBoundNode, whether a matcher passed for failed
did not depend on the set of bound nodes at that point - that's a very big
change, for example, we need to take it into account when doing
memoization, otherwise we'll create inconsistent results.

Also, I don't think that the proposed non-filtering behavior of
equalsBoundNode is user friendly. I think that having it be a filtering
matcher makes more sense, and that other filtering matchers are a different
topic (as, in principle, all matchers are already filtering matchers on
predicates of nodes, apart from the forEach matchers)

I don’t understand the confusion. To me, it feels kind of natural now.

Consider a matcher similar to what we have looked at above:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind(“f”)))), hasName(“SomeFunction”));

This is very similar to my second example case. The callback will be called once for each CallExpr in the function “SomeFunction” with the called function bound to “f”. Now, if we change this to:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind(“f”)))), equalsBoundNode(“f”));

This equalsBoundNode matcher will behave exactly like the hasName()-matcher. Thus, the callback will be called once for each CallExpr where a function calls itself recursively.

Or am I missing something?

I don't understand the confusion. To me, it feels kind of natural now.

Consider a matcher similar to what we have looked at above:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
hasName("SomeFunction"));

This is very similar to my second example case. The callback will be
called once for each CallExpr in the function "SomeFunction" with the
called function bound to "f". Now, if we change this to:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
equalsBoundNode("f"));

This equalsBoundNode matcher will behave exactly like the
hasName()-matcher. Thus, the callback will be called once for each CallExpr
where a function calls itself recursively.

Or am I missing something?

Yes, there's a huge difference. (I agree with the proposed behavior tho)
(1) is fixed with SomeFunction - that is, either all results from
forEachDescendant apply, or none
(2) means that the result set created by forEachDescendant is filtered down
by equalsBoundNode

I don't understand the confusion. To me, it feels kind of natural now.

Consider a matcher similar to what we have looked at above:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
hasName("SomeFunction"));

This is very similar to my second example case. The callback will be
called once for each CallExpr in the function "SomeFunction" with the
called function bound to "f". Now, if we change this to:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
equalsBoundNode("f"));

This equalsBoundNode matcher will behave exactly like the
hasName()-matcher. Thus, the callback will be called once for each CallExpr
where a function calls itself recursively.

Or am I missing something?

Yes, there's a huge difference. (I agree with the proposed behavior tho)
(1) is fixed with SomeFunction - that is, either all results from
forEachDescendant apply, or none
(2) means that the result set created by forEachDescendant is filtered
down by equalsBoundNode

To me that seems to be an implementation difference, not a behavior
difference.

I don't understand the confusion. To me, it feels kind of natural now.

Consider a matcher similar to what we have looked at above:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
hasName("SomeFunction"));

This is very similar to my second example case. The callback will be
called once for each CallExpr in the function "SomeFunction" with the
called function bound to "f". Now, if we change this to:

functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")))),
equalsBoundNode("f"));

This equalsBoundNode matcher will behave exactly like the
hasName()-matcher. Thus, the callback will be called once for each CallExpr
where a function calls itself recursively.

Or am I missing something?

Yes, there's a huge difference. (I agree with the proposed behavior tho)
(1) is fixed with SomeFunction - that is, either all results from
forEachDescendant apply, or none
(2) means that the result set created by forEachDescendant is filtered
down by equalsBoundNode

To me that seems to be an implementation difference, not a behavior
difference.

Well, it definitely restricts the set of possible implementations...

From: Manuel Klimek [mailto:klimek@google.com]
Sent: Thursday, June 06, 2013 10:58 AM
To: Daniel Jasper
Cc: Vane, Edwin; Gábor Kozár; Clang Dev List (cfe-dev@cs.uiuc.edu)
Subject: Re: [cfe-dev] RFC: What does this matcher mean to you?

  I don't understand the confusion. To me, it feels kind of natural now.

  Consider a matcher similar to what we have looked at above:

  functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")
))), hasName("SomeFunction"));

  This is very similar to my second example case. The callback will be
called once for each CallExpr in the function "SomeFunction" with the called
function bound to "f". Now, if we change this to:

  functionDecl(forEachDescendant(callExpr(callee(functionDecl().bind("f")
))), equalsBoundNode("f"));

  This equalsBoundNode matcher will behave exactly like the hasName()-
matcher. Thus, the callback will be called once for each CallExpr where a
function calls itself recursively.

  Or am I missing something?

Yes, there's a huge difference. (I agree with the proposed behavior tho)
(1) is fixed with SomeFunction - that is, either all results from
forEachDescendant apply, or none
(2) means that the result set created by forEachDescendant is filtered down by
equalsBoundNode

I think maybe this second matcher is starting to shine light on the filtering problem for me. In the second matcher, in the state it's currently implemented, the callback would be called N times for each of N CallExpr's as long as at least one of these CallExpr calls is a recursive call. Is that right? I can see why that's not helpful to the user. I guess you'd like the filtering thing to only have the callback called once and only for each recursiveCall right?