How to find all c++ class definitions, their constructors, and their use with LLVM?

Hello!

For my research project, I want to gather statistics about specific C++ codebases (i.e., firefox). Things I want to know are (1) class definitions, (2) class constructor declaration, and (3) the use of class constructors in code (i.e., instantiation of an object).

I was told that clang lib tooling might help here, so I wrote a matcher that analyzes the source code. I can find the class and their constructor declarations. Still, my code threats object instantiation as a class declaration as well (for some reason, it appears as CXXConstructorDecl in the CXXRecordDecl node of the AST tree). In addition, I cannot figure out how to analyze all the source code, but not the standard libraries. Filtering by include does not work, as it will filter all the header files. I tried the Amalgamate tool, but it still requires to analyze the code base on a file-by-file basis.

I guess my questions are:

  1. Is clang lib tooling with AST Matching the right tool for the task?
  2. If so, what APIs should I use to resolve my issues?
  3. If not, what other tool should I use?

Thank you!

Hello!

  1. Is clang lib tooling with AST Matching the right tool for the task?

I believe this is a reasonable approach. Alternatively you might consider either implementing a clang plugin (might possibly be easier to integrate to you build) or usign libclang (if you’d benefit from API stability or C interface).
https://clang.llvm.org/doxygen/group__CINDEX.html
https://clang.llvm.org/docs/ClangPlugins.html

  1. If so, what APIs should I use to resolve my issues?

I recommend inspecting AST of relevant examples with clang -Xclang -ast-dump. E. g.:

Object instantiation should look different to class declaration in the AST. Do you have any test examples or can you share the matchers that you use?

BTW, the LLVM in the title may be replaced with Clang. Since when we talk about LLVM, we may image the LLVM IR, which should be independent with C++ (by design).

Thank you for your reply!

I followed this tutorial (Tutorial for building tools using LibTooling and LibASTMatchers — Clang 16.0.0git documentation) and I have something that almost works. I do have a test case (attached at the bottom).

I figured out how to track object instantiation and how to find classes with defaults. However, I have an issue with identifying constructors that have explicit initializers (i.e., a(1) and b(2)) :

ctor( ... ) : a(1), b(2) { ... }

I am using clang-query to test my matchers, but no success so far. What I’ve noticed is that in the AST dump those initializers will look something like:

**CXXConstructorDecl** 0x2cb88e8 </home/xzz/repositories/llvm-project/build/mt_base/bbb.h:6:3, col:19> col:3 **ddd** 'void ()' implicit-inline
|-CXXCtorInitializer **Field** 0x2cb8658 **'x'** 'int'
| `-**IntegerLiteral** 0x2cb8ab8 <col:13> 'int' **123**

So, I tried to use a matching along the lines of:

match cxxConstructorDecl(forEachConstructorInitializer( cxxCtorInitializer( withInitializer(hasDescendant(anyOf(integerLiteral(), stringLiteral()))))))

My reasoning was that I need to find CXXConstructorDecl with CXXCtorInitializer that has some literal in its descendants (i.e., integerLiteral, stringLiteral, etc). However, it does not match all the constructors it should (I think?) to match. It does not match the constructor bos. Also, it does not capture the constructors from includes. Now I am a bit stuck :confused:

Test Case:

#define OBJECTS_CNT 3
#define LL 12

class bos {
    private:
        int a;
        int b;
    
    public:
        bos(int z) : b(33) {a=z;}
};

class product {
  private: 
    double z; 
    int weight=1;
    char* str = "C++";
    bos b;
    std::string s1;

  public:
    double price;
    int arr[LL];

    product(int w) : z(3), str("aaa"), b(bos(123)), s1("oh, man!") {
        weight = w;
        price = 0;
        for (int i = 0; i < LL; i++) arr[i] = i;
    };
};```