Declarative ExplodedGraph matching

Hi,
I’m aware that there had been attempts to develop a solution for detecting bugs in ClangSA, by using a declarative description of the error condition, defined by the structure of the ExplodedGraph data structure. As I can remember the solution was akin to the ASTMatchers library, so the graph nodes were matched by node-, narrowing- and traversal matchers (see AST Matcher Reference (llvm.org)).

I am interested in taking up this development and would like to inquire about the opinion of the community about this endeavor.

Shoutout to Artem Dergachev, as I think the initiative was his originally.

Hi!

Just for a point of reference, here is the initial discussion:

1 Like

Hi, yeah, I think it’s a lovely thing to have and also I think that now that Alexey’s proof-of-concept exists in the wild, we have a chance to explore the design space a bit further. ASTMatchers are great for compiler developers but they fall short of the dream to provide a way for the users to develop their own checkers. I’m really curious if we can achieve that goal, as we probably have only one chance :slight_smile:

Like clang-query eliminated the need for the users to compile clang in order to run custom ASTMatchers, we should probably preserve this achievement. The awesome IgnoreUnlessSpelledInSource feature lifts the requirement for understanding implicit/invisible AST nodes, thus reducing the entry barrier. Another problem with ASTMatchers though, that remains unaddressed for now, is their stability guarantees. Both changes in the AST and changes in the matchers themselves have the potential for breaking user-made matchers.

So I want to think really deeply about that last part. Even though static analyzer checkers can potentially introspect the AST as much as they want, the structure of checker callbacks that we have usually advocates for a much more high-level approach: say, checkLocation and checkBind group all memory reads/writes together regardless of how they’re represented in the AST, checkPreCall/PostCall treats all calls uniformly regardless of whether it’s a plain C function or a C++ virtual method or a temporary destructor or an overloaded operator new() invocation or an Objective-C message, etc.

Maybe if we focus on these basic building blocks, we can make a stable, easy-to-learn, high-level domain-specific language that speaks about the program under analysis in these very simple terms, so that it could be presented as a user-facing feature, despite being derived from the relatively unstable and somewhat quirky foundation of the Clang AST?