Match callback invocation order

Hi All,

I’m struggling to figure out the order in which MatchFinder invokes callbacks.

Here are my matches. Each match callback just dumps the file and line no.

ClangTool Tool(OptionsParser.getCompilations(),
OptionsParser.getSourcePathList());

MyMatchCallback Mcb;
MatchFinder Finder;

Finder.addMatcher(
functionDecl().bind(“fn”),
&Mcb);

Finder.addMatcher(
returnStmt().bind(“rtn”),
&Mcb);

Finder.addMatcher(
callExpr().bind(“call”),
&Mcb);

return Tool.run(newFrontendActionFactory(&Finder).get());

I run the tool on this src file:

1 int fn1(void) {return 1;}
2 int fn2(void) {return 1;}
3 int fn3(void) {return 1;}
4 int fn4(void) {return 1;}
5
6 int (main)(void) {
7 fn1();
8 return 1;
9
10 if (fn2())
11 return 1;
12
13 { /* arbitrary block */
14 if(fn3())
15 return 1;
16 }
17
18 fn4();
19
20 return 1;
21 }

This is the output I get: :<line_no>

fnDecl main /home/…/match_order_test.c:6
fn call fn1/home/…/match_order_test.c:7
Rtn /home/…/match_order_test.c:8
fn call fn4/home/…/match_order_test.c:18 << hey! line 10 should be next?
Rtn /home/…/match_order_test.c:20
fn call fn2/home/…/match_order_test.c:10
Rtn /home/…/match_order_test.c:11
fn call fn3/home/…/match_order_test.c:14
Rtn /home/…/match_order_test.c:15

I was expecting the matches to be in line-number order. In fact I’m really depending on that for the custom tool I want to build as it needs to analyze the order in which certain fns are called.

Looking at the ast-dump of the src file and adding in the order in which the CallExprs and ReturnStmt’s were matched MatchFinder looks to be doing a kind of breadth-first traversal.

TranslationUnitDecl 0x7fffc2a2b238 <>

-TypedefDecl 0x7fffc2a2bad0 <> implicit __int128_t ‘__int128’

-FunctionDecl 0x7fffc2a8a588 <line:6:1, line:21:1> line:6:6 main 'int (void)':'int (void)' -CompoundStmt 0x7fffc2a8a8f0 <col:18, line:21:1>
#1 |-CallExpr 0x7fffc2a8a6c0 <line:7:5, col:9> ‘int’
-ImplicitCastExpr 0x7fffc2a8a6a8 <col:5> 'int (*)(void)' <FunctionToPointerDecay> -DeclRefExpr 0x7fffc2a8a658 col:5 ‘int (void)’ Function 0x7fffc2a89ef0 ‘fn1’ ‘int (void)’
#2 |-ReturnStmt 0x7fffc2a8a700 <line:8:5, col:12>
`-IntegerLiteral 0x7fffc2a8a6e0 col:12 ‘int’ 1
-IfStmt 0x7fffc2a8a798 <line:10:5, line:11:16>
#5 | |-CallExpr 0x7fffc2a8a748 <line:10:9, col:13> ‘int’

-ImplicitCastExpr 0x7fffc2a8a730 <col:9> 'int (*)(void)' <FunctionToPointerDecay> -DeclRefExpr 0x7fffc2a8a710 col:9 ‘int (void)’ Function 0x7fffc2a8a0c0 ‘fn2’ ‘int (void)’
#6 | -ReturnStmt 0x7fffc2a8a788 <line:11:9, col:16> -IntegerLiteral 0x7fffc2a8a768 col:16 ‘int’ 1
-CompoundStmt 0x7fffc2a8a850 <line:13:5, line:16:5>
-IfStmt 0x7fffc2a8a838 <line:14:9, line:15:20> #7 | |-CallExpr 0x7fffc2a8a7e8 <line:14:12, col:16> 'int' -ImplicitCastExpr 0x7fffc2a8a7d0 col:12 ‘int ()(void)’
-DeclRefExpr 0x7fffc2a8a7b0 <col:12> 'int (void)' Function 0x7fffc2a8a248 'fn3' 'int (void)' #8 | -ReturnStmt 0x7fffc2a8a828 <line:15:13, col:20>
-IntegerLiteral 0x7fffc2a8a808 <col:20> 'int' 1 #3 |-CallExpr 0x7fffc2a8a8a0 <line:18:5, col:9> 'int' -ImplicitCastExpr 0x7fffc2a8a888 col:5 'int (
)(void)’
-DeclRefExpr 0x7fffc2a8a868 <col:5> 'int (void)' Function 0x7fffc2a8a3d0 'fn4' 'int (void)' #4 -ReturnStmt 0x7fffc2a8a8e0 <line:20:5, col:12>
`-IntegerLiteral 0x7fffc2a8a8c0 col:12 ‘int’ 1

In the documentation for MatchFinder it says “The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder.” but I can’t reconcile what I see with what I found on the wikipedia for ‘pre-order traversal’.

Is this a bug?

How can I influence the traversal order so that my matchers fire in source-code order?

Thanks,
Billy.

I'm not sure line numbers are the thing to rely on if that is your goal. Consider for example a lambda inside a function (though maybe you are only working on C, not C++?). Your matcher for callExpr() would need to distinguish between calls inside and outside the lambda.

Regardless of whether C++ is relevant to you, as David pointed out, this appears to be an optimization, so an alternative implementation might make sense anyway.

I would recommend you match functions as you are with functionDecl() and then in your match callback, use something like (not tested)

ast_matchers::match(functionDecl(

forEachDescendant(callExpr(forFunction(functionDecl(equalsNode(FN)))))

), *FN, *Result.Context);

The forFunction() matcher is the important part which will ensure that for example callExpr() in nested lambdas are not part of your result set.

Then order the resulting nodes by their getBeginLoc() perhaps. (Macros may need to be accounted for).

Thanks,

Stephen.

Hi Stephen,

thanks for the input.

Fortunately it’s just C I’m interested in.

It’s not just the order of certain function calls that are relevant; the ordering of some conditional expressions and return stmts are also relevant. And some of them are done via macros in which case, if I recall, I see the source location being the location of the macro definition - not the location of the macro usage.

It still sounds to me like MatchFinder is not doing exactly what it says on the tin “The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder.”

I’ve experimented with RecursiveASTVisitor and that does invoke the Visit functions in a natural/source-code order. So I will stick with that strategy for now.

Cheers,
Billy.

Hi Stephen,

thanks for the input.

Fortunately it's just C I'm interested in.

It's not just the order of certain function calls that are relevant; the ordering of some conditional expressions and return stmts are also relevant. And some of them are done via macros in which case, if I recall, I see the source location being the location of the macro definition - not the location of the macro usage.

Yes, macros add complexity.

It still sounds to me like MatchFinder is not doing exactly what it says on the tin "The order of matches is guaranteed to be equivalent to doing a pre-order traversal on the AST, and applying the matchers in the order in which they were added to the MatchFinder."

Yes, this is at least a documentation bug, if the implementation isn't changed.

Thanks,

Stephen.

Thanks to Stephen and David for your help. I’ve filed a bug report https://bugs.llvm.org/show_bug.cgi?id=46423

Hello, Billy and everybody!

Could you share please if possible how did you eventually resolve this issue?

I also faced with that problem.

As example, for this code:

void func(struct S1 S1Par1) {
    return;
}
    
int main() {
  struct S1 S1Loc;

  func(S1Loc);

  return 0;
}

I want to run several matchers

  1. Matcher for declarations (struct S1 S1Loc)
  2. Matcher for call expressions (func(S1Loc))

And I want to run them in order (1) → (2), so to process declarations first and call expressions after them. However, seems that clang treats call expressions as compound ones and postpones them.
Of course, I add them with addMatcher to the MatchFinder in this order.

Clang 13.0.1