C++ analysis priorities

Hi all,

I'm looking at possibly contributing to C++ analysis support over the next few months as part of a master's project. I have a rough idea of things that need to be implemented, but I am not sure how to prioritise them. I am hoping that the community can assist me here - what is currently stopping your programs from being analyzed?

My general goal is to implement features that will assist in analyzing the LLVM/Clang codebase, however looking at the current code it seems that existing support for some language features will have to be improved as well (eg ctor/dtors.)



Hi Tom,

I see that C++ support can grow in largely two directions:

(1) Core infrastructure, with interprocedural support for inlining C++ constructors/destructors to support RAII. This entails a bunch of intermediate infrastructure work to get there.

(2) Checkers. Having C++-specific checkers will make the analyzer more useful for C++ developers. This could be as simple as catching mismatches between new and delete/new and delete, and many others, including providing checkers for correct usage of widely used C++ APIs (e.g., Boost).

I think both are worth making progress on, and to do (2) some progress will likely need to be made on (1).

As far as infrastructure work, here are some areas that need work:

(a) Better representation of C++ constructor and destructor calls in the CFG. There is a bunch already there, but as it has been observed on the list lately there are serious deficiencies and outright bugs. Ideally we should be able to represent the complete initialization logic of a constructor in the CFG, from calling the constructor of a parent class to correctly sequencing the initializers.

Along this trajectory, there are various optimizations we can do to the CFG representation itself to make it easier to represent destructor calls. What we do know is a bit complicated, IMO.

(b) ExprEngine "inlining" support for C++ constructors/destructors. Interprocedural analysis is one area we would like to grow the analyzer, and one technique to do that is to simply "inline" function calls for function bodies that are available. Some of this has been prototyped in the analyzer already, and there is currently work on making it more solid, at least for inlining simple functions. Being able to do this *well* for simple C++ objects that are used for RAII, for example, will be really critical for making some checkers really shine for C++.

(c) Support for additional C++ expressions. In ExprEngine::Visit(), you can see a whole bunch of C++ AST expression kinds that are simply not handled, and halt static analysis altogether:

    case Stmt::CXXBindTemporaryExprClass:
    case Stmt::CXXCatchStmtClass:
    case Stmt::CXXDependentScopeMemberExprClass:
    case Stmt::CXXPseudoDestructorExprClass:
    case Stmt::CXXThrowExprClass:
    case Stmt::CXXTryStmtClass:
    case Stmt::CXXTypeidExprClass:
    case Stmt::CXXUuidofExprClass:
    case Stmt::CXXUnresolvedConstructExprClass:
    case Stmt::CXXScalarValueInitExprClass:
    case Stmt::DependentScopeDeclRefExprClass:
    case Stmt::UnaryTypeTraitExprClass:
    case Stmt::BinaryTypeTraitExprClass:
    case Stmt::ArrayTypeTraitExprClass:
    case Stmt::ExpressionTraitExprClass:
    case Stmt::UnresolvedLookupExprClass:
    case Stmt::UnresolvedMemberExprClass:
    case Stmt::CXXNoexceptExprClass:
    case Stmt::PackExpansionExprClass:
    case Stmt::SubstNonTypeTemplateParmPackExprClass:
    case Stmt::SEHTryStmtClass:
    case Stmt::SEHExceptStmtClass:
    case Stmt::SEHFinallyStmtClass:

Further, there are some AST expressions we handle, but don't do a good job:

   // We don't handle default arguments either yet, but we can fake it
    // for now by just skipping them.
    case Stmt::SubstNonTypeTemplateParmExprClass:
    case Stmt::CXXDefaultArgExprClass:

and support for C++ lambdas as they become real in Clang.

Infrastructure is only part of the story; ultimately people want to find bugs. Some possible checkers include:

(1) mismatched new/delete new/delete, or malloc() and delete, etc.

(2) productizing the invalid iterator checker

(3) making sure a destructor blows away everything a constructor creates/initializes. This is a hard one, but could be REALLY useful if done well. This could easily take up a good portion of your thesis work, and would be interesting work to write about.

(4) Various checks for "Effective C++" rules.

(5) securely using std::string, i.e. The CERT Division | Software Engineering Institute

(6) CERT's C++ secure coding standard, https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637, lots of potential checks here, not all of them specific to c++, but general goodness.