[RFC] Unified clang-tidy runner
Motivation & Overview
There are currently (at least) five different ways to run clang-tidy on a CMake-based project:
- clang-tidy CLI
- run-clang-tidy.py
- clang-tidy-diff.py
- check_clang_tidy.py
- CMake integration
None of these scripts share any code and much of the functionality is duplicated. The implementations are fragmentory and some useful features are not available, most notably fixit de-duplication after a CMake clang-tidy run:
| Method | Purpose | Parallel execution | De-duplication of warnings | Incremental builds |
|---|---|---|---|---|
clang-tidy CLI |
Running a single instance of clang-tidy |
No | No | No |
run-clang-tidy.py |
Running clang-tidy in parallel |
Yes | Yes (by merging YAML files) | No |
clang-tidy-diff.py |
Running clang-tidy in parallel on a diff |
Yes | No | No |
check_clang_tidy.py |
Running clang-tidy unit tests |
No | N/A | No |
| CMake integration | Running clang-tidy in parallel alongside a build |
Yes (via the build system) | No | Yes |
There are also several third-party clang-tidy runners [1] [2] [3], which I haven’t used. Perhaps this is an indicator that our LLVM scripts aren’t sufficient for downstream use.
When integrating clang-tidy into CI loops, it ends up being cleaner to write your own runner from scratch. Some developers (like me) will only discover this after attempting to use the LLVM python scripts. Replacing the LLVM scripts with a unified clang-tidy runner would give us all the usual benefits of de-duplicating code and centralising logic, as well as being very nice to have for developers who use clang-tidy in their CI. It would also give us an obvious place to implement things like caching of clang-tidy results or incremental clang-tidy runs, if that’s something we want to do in future.
I propose a single general-purpose clang-tidy runner that can initially:
- Run in parallel
- De-duplicate fixes from different TUs
- De-duplication of fixits from different TUs could potentially be done in memory as they’re written
- De-duplication can’t be done in clang-apply-replacements because “it’s not possible for the apply-replacement tool to know whether identical insertions should be merged or not”, but our clang-tidy runner could be confident that identical fixits generated by different TUs but targeting the same header file should be de-duplicated
In future, we could extend this runner to:
- Filter results by file or line number (like clang-tidy-diff)
- Run the clang-tidy unit tests
- Potentially support incremental clang-tidy runs
- Potentially cache clang-tidy results (like https://github.com/matus-chochlik/ctcache)
Impact on LLVM
run-clang-tidy.py, clang-tidy-diff.py and check_clang_tidy.py would all be replaced with a general-purpose clang-tidy runner.
We would then have an obvious central place to implement result caching, incremental runs, etc.
I don’t see the CMake clang-tidy integration changing (see below).
Open questions
-
What form should the runner take?
- The simplest and most obvious thing to do is to write the runner in python
- There is also an argument for moving to C++:
- Most (all?) other LLVM tools are written in C++
- We may want to integrate tightly with outputs from other tools (e.g. in-memory de-duplication of fixits)
clang::tooling::ToolExecutorhas been suggested to me, but it’s not a part of LLVM I’m particularly familiar with
-
Can/should this runner be extended to other tools?
- Could (parts of) the runner be generalised and used for other tools, e.g. clang-query?
-
Is there a way to sensibly integrate this tool with build systems?
- There’s a suggestion that, if we could somehow teach build systems about our runner, we could get build system features like incremental or distributed builds for cheap
- I don’t have a clear picture of how this would work
