AST Matcher matches InitListExpr child twice

Consider the following C++11 (or later) code fragment, which uses an initialization list that contains a function call:

  int f();
  int i {f()};

The AST for that initialization list consists of an InitListExpr with a single CallExpr child:

  InitListExpr 0x2df2b80 'int'
  `-CallExpr 0x2df2b10 'int'
    `-ImplicitCastExpr 0x2df2af8 ...
      `-DeclRefExpr 0x2df2aa0 ...

If I use the Clang AST Matcher API to match on InitListExpr, then I get one match as expected. However, if I match on CallExpr, then the single CallExpr in the above AST is matched *twice*. Anything below that CallExpr node is also matched twice; for example, I'd get two matches for the DeclRefExpr if that's what my AST Matcher were looking for. This is definitely a problem for my code, and would seem to be a bug in general. Surely that's not the intended behavior, is it?

Attached below is a small test program, "visit.cc", that matches and dumps out InitListExpr and CallExpr nodes in the AST. Running this on my above example clearly shows that only one InitListExpr is matched, but a single CallExpr (at a single memory address) is matched *twice*.

Should I file this glitch as a Clang bug report rather than describing it on this mailing list? Is there a reasonable workaround until this bug can be properly fixed?

Thanks,
Ben

visit.cc (1.38 KB)

Consider the following C++11 (or later) code fragment, which uses an initialization list that contains a function call:

     int f();
     int i {f()};

The AST for that initialization list consists of an InitListExpr with a single CallExpr child:

     InitListExpr 0x2df2b80 'int'
     `-CallExpr 0x2df2b10 'int'
       `-ImplicitCastExpr 0x2df2af8 ...
         `-DeclRefExpr 0x2df2aa0 ...

If I use the Clang AST Matcher API to match on InitListExpr, then I get one match as expected. However, if I match on CallExpr, then the single CallExpr in the above AST is matched *twice*. Anything below that CallExpr node is also matched twice; for example, I'd get two matches for the DeclRefExpr if that's what my AST Matcher were looking for. This is definitely a problem for my code, and would seem to be a bug in general. Surely that's not the intended behavior, is it?

That looks to be by design, cf. DEF_TRAVERSE_STMT(InitListExpr,...) in include/clang/AST/RecursiveASTVisitor.h calling into TraverseSynOrSemInitListExpr twice (for "syntactic" and "semantic" InitListExpr).

In some of our LibreOffice RecursiveASTVisitor implementations we use

    bool TraverseInitListExpr(
        InitListExpr * expr
#if CLANG_VERSION >= 30800
        , DataRecursionQueue * queue = nullptr
#endif
        )
    {
        return WalkUpFromInitListExpr(expr)
            && TraverseSynOrSemInitListExpr(
                expr->isSemanticForm() ? expr : expr->getSemanticForm()
#if CLANG_VERSION >= 30800
                , queue
#endif
                );
    }

to traverse only once (where CLANG_VERSION is LibreOffice's way of supporting multiple versions of Clang).

That looks to be by design, cf. DEF_TRAVERSE_STMT(InitListExpr,...) in include/clang/AST/RecursiveASTVisitor.h calling into TraverseSynOrSemInitListExpr twice (for "syntactic" and "semantic" InitListExpr).

Nutty! But clearly by design, as you point out. There seems to be no way to detect this duplication short of maintaining my own "nodes I have already seen" set, and no way to prevent double visits short of defining my own RecursiveASTVisitor.

In some of our LibreOffice RecursiveASTVisitor implementations we use
[...] to traverse only once

Thank you for this example code, Stephan. Unfortunately I see no way to tell the AST Matcher system to use an alternate visitor of this sort. Am I missing some API that would let me do this? Or is the recursive visit outside of my control if I'm using MatchFinder to drive my tool?

-- Ben

That looks to be by design, cf. DEF_TRAVERSE_STMT(InitListExpr,…) in
include/clang/AST/RecursiveASTVisitor.h calling into
TraverseSynOrSemInitListExpr twice (for “syntactic” and “semantic”
InitListExpr).

Nutty! But clearly by design, as you point out. There seems to be no
way to detect this duplication short of maintaining my own “nodes I have
already seen” set, and no way to prevent double visits short of defining
my own RecursiveASTVisitor.

In some of our LibreOffice RecursiveASTVisitor implementations we use
[…] to traverse only once

Thank you for this example code, Stephan. Unfortunately I see no way to
tell the AST Matcher system to use an alternate visitor of this sort.
Am I missing some API that would let me do this? Or is the recursive
visit outside of my control if I’m using MatchFinder to drive my tool?

Generally, there are many reasons you might visit the same node twice - after all, the AST is not a tree, but a graph :slight_smile: When you write tools that need to run on multiple TUs, you’ll have to do some form of deduplication anyway.

Generally, use the fact that many nodes have pointer identity (you can deduplicate with a simple set on the pointers).

Generally, there are many reasons you might visit the same node twice - after all, the AST is not a tree, but a graph :slight_smile:

OK, fair enough. Thank you for correcting my assumptions about this.

Generally, use the fact that many nodes have pointer identity (you can deduplicate with a simple set on the pointers).

Understood. Thanks!