Many .c files as input to scan-build


the clang static analyzer does a good job, performing on the individual source files. But it with a single .c/.cpp file
as input it cannot catch all codepaths of a program having many source files.

In particular, using GLibs g_hash_table_new allocates memory and g_hash_table_destroy() frees the memory, but scan-build
does not know this and does not check for it.

I mean, scan-build provides different results for the same program, depending on how source code is split into different

One way to solve this is to create a huge .h file containing recursively all function definitions needed by a .c/.cpp
file, including sources from libraries and feeding this to scan-build.

It would be however easier, if scan-build is extended to accept as input many .c and .cpp files, glue them internally
into one and then handle that big file as input.

This will help finding troubles, that are split between source files.


The problem you're describing is known as "cross translation unit analysis". The Static Analyzer is part of Clang, and the primary purpose of Clang is to compile one translation unit at a time, so the Static Analyzer inherits the same limitation.

Doing "unity builds" is one way around this problem. This wouldn't scale to huge projects and it's not that trivial to concatenate all the files, depending on the build system (the project may use freshly compiled executables to autogenerate source code for subsequent passes or compile the same source code for different architectures).

Note that even if you do a unity build, the time it takes for the Static Analyzer to perform analysis of a certain quality would grow non-linearly (in fact, "exponentially" would way more accurate). Even though all of the source code is available, making proper use of this information to achieve analysis quality similar to that of a smaller codebase would be impossible. You will be paying with loss of coverage, the analyzer will give up sooner and only find more shallow bugs.

There's an effort to perform cross-translation-unit analysis through ASTImporter - the same facility that supports executing arbitrary expressions in LLDB.This allows importing only small chunks of the program as needed without constructing a whole-program AST, but generally i feel it's not really that much better than unity builds. See CTU threads on this mailing list. They report success when it comes to overall usefulness of the Static Analyzer, so i guess it's worth it to think in that direction, but it's most likely less worth it than using more expensive and sophisticated but more scalable techniques such as summary-based analysis.

Hello Artem,

thanks for your answer.

For combining several source files into one, that is then analyzed by scan-build, in fact suggests:

It is also possible to use scan-build to analyze specific files:
$ scan-build gcc -c t1.c t2.c
This example causes the files t1.c and t2.c to be analyzed.

My reading is that on this call scan-build generates a single report, resulted by merging t1.c and t2.c and then
analyzing the result, since the gcc call generates a single file.

Is the size of the input to the analyzer currently inversely proportional to the quality of the results?

I ask the last question, since you wrote, that for unity builds the compiler would give up sooner and only find more
shallow bugs.