Working on open projects

Hi All,

I was going through open projects page (https://clang-analyzer.llvm.org/open_projects.html) and wondering if that page is up to date or not. I found ‘Explicitly model standard library functions with BodyFarm’ and ‘Enhance CFG to model C++ new more precisely’ interesting to work on. I have some experience with LLVM API and modeling functions for verification as part of my masters project. So if anyone can let me know whom should I contact for those projects or how should I get started then it would be very helpful.

Thanks,

Jiten

Hello,

These are analyzer projects, which improve symbolic execution-based bug-finding of the clang's --analyze option, but not compilation or code generation. At the same time, these projects require relatively little understanding of the analyzer's internals (compared to other projects).

* The body farm project does not require much knowledge about the analyzer, and mostly requires knowledge of the AST. The idea of the project is to synthesize ASTs of functions in order to help the analyzer what they do, when they are not available in the current translation unit (which is a problem because clang only compiles, and therefore analyzes, one translation unit at a time). Having an AST for an external function automagically allows the analyzer to "inline" it during analysis; lack of the AST would mean that the analyzer would assume that anything can happen when such function is called, which reduces precision of the analysis.

Body-farmed ASTs are useful for system library functions that are simple enough. The AST does not need to necessarily do exactly what the function does, because the analyzer does not model everything exactly. For example, any atomic operations on integers may be replaced with regular integer operations because the analyzer would naturally do all its symbolic calculations atomically. You can see what functions are already there (very few, i guess we only have a couple of libdispatch functions that are modeled to immediately call their callback; George has recently farmed a body for std::call_once similarly in https://reviews.llvm.org/D37840, which turned out to be harder than usual) and follow the example to add the functions you're interested in. Various compiler builtins (eg., again, atomics?) might make a nice addition, and as far as I remember, Devin may have a couple of ideas as well.

There is another mechanism in the analyzer, "evalCall", that allows analyzer checkers to compute the effects of the function directly, without consulting any sort of AST. The evalCall mechanism is older and in many (but not all) cases more powerful, but probably overly powerful and poorly scales with the number of checkers, so body farms are preferred whenever possible.

Finally, there is an effort to allow the analyzer to import stuff from other translation units through ASTImporter (https://reviews.llvm.org/D30691); if successful, as a neat side effect this may allow us to replace manual AST construction in body farms with simply feeding raw source code to the analyzer, which might be easier.

* The C++ operator-new project is about constructing the clang's CFG more accurately. Because most of the compilation relies on the LLVM's CFG, clang CFG is essentially used only by the analyzer and a couple of analysis-based compiler warnings, but not for compilation, and as such it is not entirely finished. I didn't look deeply into this problem yet, but it seems that by the time the analyzer sees the object construction element in the CFG, he wasn't informed that he needs to allocate symbolic memory to hold the newly constructed object, which needs fixing.

While fixing the CFG is the first step, the ultimate goal of this project is to enable the "-analyzer-config c++-allocator-inlining=true" option by default. Which means that work would also need to be done on the analyzer side in order to understand the new CFG items and act accordingly.

As an example of a recent CFG work i could recommend https://reviews.llvm.org/D15031 which is not related to operator new, but gives an impression of how this area of our code looks.

* As for contacts, this mailing list is the right place to discuss what you want to do, and our phabricator (reviews.llvm.org) is the right place to publish your patches. I've also CC'd the analyzer's code owner Anna and other potentially interested people.

Hi Jiten,

The open projects list is somewhat out of date. However, the main problem is that most of the projects on the list are too difficult, especially, for contributors who do not have a lot of experience working on the analyzer.

One more specific suggestion I have that aligns with the Body Farm project is to add modeling for the atomics:
http://llvm.org/OpenProjects.html#clang-sa-atomics

Let us know if you have more questions or would like other starter project suggestions.

Thanks!
Anna.

Hi,

Hi Anna,

I think the atomics modeling project seems interesting. But I was looking at the BodyFarm code and I found that atomics is being taken care of by this function: https://github.com/llvm-mirror/clang/blob/master/lib/Analysis/BodyFarm.cpp#L274 Can you please tell me if this function needs to be improved? How?

Thanks,

Jiten

Hi,

Hi Gabor,

Thanks,
George

Hi Jiten,

The existing function models Apple specific API, but std::atomic_compare_exchange_* functions are not modeled yet. The modeling would be similar but not quite the same.

Cheers,
Anna