1. I'm glad that we're finally trying to avoid dumping PCH-s on disk!
2. As far as I understand, dependencies are mostly about Clang binary
size. I don't know for sure but that's what I had to consider when
adding libASTMatchers into the Clang binary a few years ago.
3. I strongly disagree that JSON compilation database is "just
this purpose". I don't mind having explicit improved support for
I would definitely prefer not to hardcode it as the only possible
option. Compilation databases are very limited and we cannot drop
projects or entire build systems simply because they can't be
represented accurately via a compilation database. So I believe that
this is not the right solution for CTU in particular. Instead, an
external tool like scan-build should be guiding CTU analysis and
coordinate the work of different Clang instances so that to abstract
Clang away from the build system.
What functionality do you picture the scan-build-like tool having that couldn't be supported if that tool instead built a compilation database & the CTU/CSA was powered by the database? (that would separate concerns: build command discovery from execution, and make scan-build-like tool more general purpose, rather than specific only to the CSA)
Here are a few examples (please let me know if i'm unaware of the latest developments in the area of compilation databases!)
- Suppose the project uses precompiled headers. In order to analyze a file that includes a pch, we need to first rebuild the pch with the clang that's used for analysis, and only then try to analyze the file. This introduces a notion of dependency between compilation database entries; unless entries are ordered in their original compilation order and we're analyzing with -j1, race conditions will inevitably cause us to occasionally fail to find the pch. I didn't try to figure out what happens when modules are used, but i suspect it's worse. But if analysis is conducted alongside compilation and the build system waits for the analysis to finish like it waits for compilation to finish before compiling dependent translation units, race conditions are eliminated. This is how scan-build currently works: it substitutes the compiler with a fake compiler that both invokes the original compiler and clang for analysis. Of course, cross-translation-unit analysis won't be conducted in parallel with compilation; it's multi-pass by design. The problem is the same though: it should compile pch files first but there's no notion of "compile this first" in an unstructured compilation database.
- Suppose the project builds the same translation unit multiple times, say with different flags, say for different architectures. When we're trying to lookup such file in the compilation database, how do we figure out which instance do we take? If we are to ever solve this problem, we have to introduce a notion of a "shipped binary" (an ultimate linking target) in the compilation database and perform cross-translation-unit analysis of one shipped binary at a time.
- There is a variety of hacks that people can introduce in their projects if they add arbitrary scripts to their build system. For instance, they can mutate contents of an autogenerated header in the middle of the build. We can always say "Well, you shouldn't do that", but people will do that anyway. This makes me believe that no purely declarative compilation database format will ever be able to handle such Turing-complete hacks and there's no other way to integrate analysis into build perfectly other than by letting the build system guide the analysis.
I'm also all for separation of concerns and I don't think any of this is specific to our static analysis.