Using ccache to cache clang-tidy results

Hey,

has anyone worked on integrating clang-tidy into ccache?
I made a local PoC. It works well and give great improvements to clang-tidy when running it incrementally.

But to be really reliable, I think I need to have at least one extra flag in clang-tidy. The main difficulty is that clang-tidy automatically finds two files: The .clang-tidy config and the compile_commands.json .

For the .clang-tidy config, I currently hash the output of clang-tidy --dump-config <original-args> so ccache doesn’t need to find that file (and then find parent files in case InheritParentConfig is true).

For the compile_commands.json , I’m currently only supporting the case where all arguments are explicitly given to clang-tidy (after -- ; I modified run-clang-tidy.py to pass them from the compile_commands.json ).

For the preprocessed code that needs to be hashed, I’m trying to find a clang that “belongs to” the clang-tidy by looking for one with the same name (e.g. clang-tidy-15 → clang-15 ) in the same path.

I think all of those troubles could be remedied if clang-tidy had a flag (e.g. named --dump-all-inputs) which would:

  • provide the output of --dump-config
  • and for each source file that clang-tidy would process, output the effective compiler arguments (from compile_commands.json and from the command line) and the preprocessed code.

ccache would then only need to hash that to determine cache hits/misses.
For example, when running ccache clang-tidy -p . -extra-arg=-Wsomewarning src/file.cpp,
ccache would invoke clang-tidy --dump-all-inputs -p . -extra-arg=-Wsomewarning src/file.cpp,
which would output on stdout:

---
Checks:          "boost-use-to-string"
WarningsAsErrors: '*'
HeaderFilterRegex: ''
AnalyzeTemporaryDtors: false
FormatStyle:     none
User:            mgehre
CheckOptions:
...
Effective command line: clang-tool -DSomeDefine -Iinclude -Wsomewarning src/file.cpp
Preprocessed code:
int main() {
 return 0;
}

What do you think?

Personally I tested 2 ways of cache:

  1. Executing clang to get pre-process output from python script, calculate cache and then run clang-tidy if needed (fast on SSD, slow on HDD).
  2. Integrate cache into clang-tidy as --cache-directory, calculate cache inside clang-tidy (fast on HDD, a little bit slower on SSD due to pre-processing in clang-tidy).

Both work fine but got some overhead.
My next plan is to create some application that could take compile_commands.json, resolve all includes, calculate hash, and dump back .json with hash, and then use it from python (got a working prototype, but need to improve it). That would be fastest.

We run clang-tidy in Bazel and caching works like a charm without any modifications to clang-tidy :slight_smile:

How do you handle the compile_commands.json? Do you not use it in bazel, or does any change of it invalidate the cache for all clang-tidy invocations?

Nope, we don’t use compile_commands.json, instead we pass all compiler flags directly to the clang-tidy invocation (as well as the .clang-tidy file). This information is known to Bazel already since the clang-tidy invocation is implemented as an aspect, so we don’t need to duplicate the compiler flags.

1 Like