AST matchers and thread safety

Hi,

Are the AST matchers thread safe? I have a bunch of matchers which I would like to run in separate threads for performance reasons. Is that expected to work, or am I overly optimistic?

/Jesper

Hi,

Since the AST should be immutable once created, and AST matchers are only a high-level wrapper for RecursiveASTVisitor, so I’d expect no problems with multithreading. This definitely needs to be confirmed though by someone who actually knows what they are talking about. :slight_smile:

Just out of curiosity, what is your use case? Navigating the AST is fast, especially compared to the cost of building the AST. Are you running a huge number of matchers?

Hi,
Since the AST should be immutable once created, and AST matchers are only a high-level wrapper for RecursiveASTVisitor, so I'd expect no problems with multithreading. This definitely needs to be confirmed though by someone who actually knows what they are talking about. :slight_smile:
Just out of curiosity, what is your use case? Navigating the AST is fast, especially compared to the cost of building the AST. Are you running a huge number of matchers?

I have a legacy C++ code base which I'm trying to extract useful information out of. For example, a single matcher which matches all method definitions of a particular class takes around 8 seconds on my machine, varying a little with the complexity of the matcher.

Is the AST kept around for the lifetime of the "ClangTool" object, or is it recreated each time the "run" method is invoked? Some of my matchers depend on information gathered by other matchers (one matcher gathers method definitions so they can be consumed by when matching the corresponding class declarations). For example:

MatchCallback cb1, cb2;
MatchFinder finder1

finder1.addMatcher(m1, &cb1);
finder1.addMatcher(m2, &cb2);

tool->run(newFrontendActionFactory(&finder1));

MatchCallback cb3, cb4;
MatchFinder finder2

finder2.addMatcher(m3, &cb3);
finder2.addMatcher(m4, &cb4);

tool->run(newFrontendActionFactory(&finder2));

Will this build the AST twice or just once? Is there any guarantee concerning the order in which the matcher callbacks are invoked?

Are you using a debug build of llvm/clang btw? Parsing is WAY slower in debug mode and if your translation unit you're matching on is very large you'll definitely see a difference between debug/release speeds.

If you have a large code base, I’d suggest to parallelize per translation unit. Why do you think further parallelizing in-process would help?

If you’re concerned about the intermediate file format, I think that in most cases I’ve seen the data is simple enough that it will beat the cost of a correct multi-threaded implementation of anything by far…

Cheers,
/Manuel

Yes, this is probably the most reasonable way to go. I've refactored my code now so that I can run all my matchers in a single call to ClangTool::run(), and that is (for the moment) fast enough.

Thanks for all the comments,