Running ASTMatcher over main file only

Is there a way to run a matcher directly on just the main source file only, as opposed to all of the includes? I know I can discriminate in my callback, but didn’t know if the “penalty” for traversing so much code was negligible or somehow avoidable.

Any advice is much appreciated.

Kirk Fertitta

Chief Technical Officer

Pacific MindWorks, Inc.

ph: 858-207-6198

fax: 858-521-1385

In my experience the cost of traversing parts of the AST that come from a header is negligible. So if you reject matches in the callback for those that come from files you don’t want to change that would be the best way to do it. If you’re interested in measuring the performance difference you’d have to hack the internal RecursiveASTVisitor the match finding stuff uses and include a file test at various nodes in the tree to prune the whole sub-tree. Unless your translation unit is absolutely massive and most of it comes from files you don’t care about I can’t see there being much of a difference.

You don't have to traverse it all. You can implement your own
TraverseDecl that (roughly) checks which file a Decl is in, returning
true early if it's not in a file of interest and calling the default
RecursiveASTVisitor::TraverseDecl otherwise.

-- James

Unless your translation unit is absolutely massive and most of it comes
from files you don't care about I can't see there being much of a
difference.

Unfortunately, "absolutely massive and most of it comes from files you
don't care about" is the current state of affairs for C++ :frowning:

-- Sean Silva

Thanks very much Edwin. This is the sort of insight I was looking for. Quite new to Clang here. I certainly would prefer to keep it simple if I can, and there is some perf tolerance for my application, as what I’m initially doing is building a library to do some fairly basic (but domain-specific) roundtripping on a VC++ app. So, there will be windows headers included as well as the usual suspects (MS STL, etc). But, my roundtripping operations are always taking place in only a few files within the project. And, the operations take place in response to user-initiated commands, which means as long as the UI is “responsive” enough, I can tolerate some extra traversal it seems. I certainly don’t feel 100% comfortable hacking the source just yet to modify the RecursiveASTVisitor.

From a functional standpoint, should I be concerned about the matcher coming across something in the Windows headers it doesn’t understand (which I understand will certainly happen) if I’m never concerned about matching in those headers? In other words, if I want to find a FunctionDecl in my main source file and Clang chokes on an included header, would it still likely find my main source file decl or is it an “all-or-nothing” deal? I suppose if a type decl was in a header it didn’t understand and that type was used in the signature of my target function, then seemingly that would be a big problem.

Thanks very much again for taking the time to answer Clang newbie questions.

Regards,

Kirk Fertitta

Chief Technical Officer

Pacific MindWorks, Inc.

ph: 858-207-6198

fax: 858-521-1385

Yes, absolutely. And that fact is what prompted me to believe that everyone else doing non-trivial matchers would surely face the same question.

Kirk Fertitta

Chief Technical Officer

Pacific MindWorks, Inc.

ph: 858-207-6198

fax: 858-521-1385

> Is there a way to run a matcher directly on just the main source file
only,
> as opposed to all of the includes? I know I can discriminate in my
callback,
> but didn’t know if the “penalty” for traversing so much code was
negligible
> or somehow avoidable.
>
> Any advice is much appreciated.

You don't have to traverse it all. You can implement your own
TraverseDecl that (roughly) checks which file a Decl is in, returning
true early if it's not in a file of interest and calling the default
RecursiveASTVisitor::TraverseDecl otherwise.

While possible, I'd advise against doing your own AST traversal - it is
quite an undertaking, and very easy to get wrong.

Unless your translation unit is absolutely massive and most of it comes
from files you don't care about I can't see there being much of a
difference.

Unfortunately, "absolutely massive and most of it comes from files you
don't care about" is the current state of affairs for C++ :frowning:

Interestingly enough the cost of running the matchers is usually very small
compared to the cost of creating the AST, thus filtering out unneeded
results seems like a good approach.

Thanks very much Edwin. This is the sort of insight I was looking for.
Quite new to Clang here. I certainly would prefer to keep it simple if I
can, and there is some perf tolerance for my application, as what I’m
initially doing is building a library to do some fairly basic (but
domain-specific) roundtripping on a VC++ app. So, there will be windows
headers included as well as the usual suspects (MS STL, etc). But, my
roundtripping operations are always taking place in only a few files within
the project. And, the operations take place in response to user-initiated
commands, which means as long as the UI is “responsive” enough, I can
tolerate some extra traversal it seems. I certainly don’t feel 100%
comfortable hacking the source just yet to modify the RecursiveASTVisitor.
****

** **

From a functional standpoint, should I be concerned about the matcher
coming across something in the Windows headers it doesn’t understand (which
I understand will certainly happen) if I’m never concerned about matching
in those headers? In other words, if I want to find a FunctionDecl in my
main source file and Clang chokes on an included header, would it still
likely find my main source file decl or is it an “all-or-nothing” deal? I
suppose if a type decl was in a header it didn’t understand and that type
was used in the signature of my target function, then seemingly that would
be a big problem.

If clang "chokes" on a header, chances are that the rest of the AST will be
slightly wrong, and you might miss some things as the AST doesn't look like
you'd expect it.

As Manuel mentioned, I wouldn’t worry too much about ‘choking’. It shouldn’t happen and if it does that’d be a bug in clang. If it failed to produce an AST you’d never get to matching anyway.

By the way, you’ll find AST producting VERY slow using debug-builds of llvm and clang. If you’re having performance issues, try using a release build first before deciding to look elsewhere for performance improvements.