[RFC] Unified clang-tidy runner

[RFC] Unified clang-tidy runner

Motivation & Overview

There are currently (at least) five different ways to run clang-tidy on a CMake-based project:

  • clang-tidy CLI
  • run-clang-tidy.py
  • clang-tidy-diff.py
  • check_clang_tidy.py
  • CMake integration

None of these scripts share any code and much of the functionality is duplicated. The implementations are fragmentory and some useful features are not available, most notably fixit de-duplication after a CMake clang-tidy run:

Method Purpose Parallel execution De-duplication of warnings Incremental builds
clang-tidy CLI Running a single instance of clang-tidy No No No
run-clang-tidy.py Running clang-tidy in parallel Yes Yes (by merging YAML files) No
clang-tidy-diff.py Running clang-tidy in parallel on a diff Yes No No
check_clang_tidy.py Running clang-tidy unit tests No N/A No
CMake integration Running clang-tidy in parallel alongside a build Yes (via the build system) No Yes

There are also several third-party clang-tidy runners [1] [2] [3], which I haven’t used. Perhaps this is an indicator that our LLVM scripts aren’t sufficient for downstream use.

When integrating clang-tidy into CI loops, it ends up being cleaner to write your own runner from scratch. Some developers (like me) will only discover this after attempting to use the LLVM python scripts. Replacing the LLVM scripts with a unified clang-tidy runner would give us all the usual benefits of de-duplicating code and centralising logic, as well as being very nice to have for developers who use clang-tidy in their CI. It would also give us an obvious place to implement things like caching of clang-tidy results or incremental clang-tidy runs, if that’s something we want to do in future.

I propose a single general-purpose clang-tidy runner that can initially:

In future, we could extend this runner to:

Impact on LLVM

run-clang-tidy.py, clang-tidy-diff.py and check_clang_tidy.py would all be replaced with a general-purpose clang-tidy runner.

We would then have an obvious central place to implement result caching, incremental runs, etc.

I don’t see the CMake clang-tidy integration changing (see below).

Open questions

  1. What form should the runner take?

    • The simplest and most obvious thing to do is to write the runner in python
    • There is also an argument for moving to C++:
      • Most (all?) other LLVM tools are written in C++
      • We may want to integrate tightly with outputs from other tools (e.g. in-memory de-duplication of fixits)
      • clang::tooling::ToolExecutor has been suggested to me, but it’s not a part of LLVM I’m particularly familiar with
  2. Can/should this runner be extended to other tools?

    • Could (parts of) the runner be generalised and used for other tools, e.g. clang-query?
  3. Is there a way to sensibly integrate this tool with build systems?

    • There’s a suggestion that, if we could somehow teach build systems about our runner, we could get build system features like incremental or distributed builds for cheap
    • I don’t have a clear picture of how this would work
1 Like

IMHO it’s impossible to create a “one tool to rule them all”, everyone has their own needs. It’s much easier to create your own small wrapper than to negotiate a feature upstream, integrate that feature and ensure it doesn’t break the many other features wanted by other people.

For example, already your first bullet “Run in parallel” is not something I’d want, because parallelism is handled by my build system (Bazel).

PS: mandatory xckd reference :slight_smile:

1 Like

Those are all valid criticisms if I was proposing a new tool, but really I’m proposing replacing 2-3 existing tools with one centralised tool - there’s even a comment on clang-tools-extra/clang-tidy/tool/run-clang-tidy.py that reads # FIXME: Integrate with clang-tidy-diff.py!

I agree that we’re not going to get to something completely universal. I can’t see, for example, how the CMake integration would end up using the centralised runner. I think we can do better than the current scripts, however.

I think there’s some parts in clang-tidy-diff (and maybe check_clang_tidy) than can definitely be imported from run-clang-tidy and consolidated, that’s probably the easiest first step. merge_replacement_files, find_compilation_database, find_binary, some of the argument definitions, … seem like good candidates from a quick look.

Maybe later the diff command can be made part of the run-clang-tidy script. It’s already essentially the same script with an added file/line filter. But I think it’s okay to have the other different scripts kept separate. I personally don’t find the diff script all that useful anyway (instead, running clang-tidy on the previous patch, then the current patch, and comparing warnings would be a more interesting signal, but trickier to write).

I’m generally +1 on merging clang-tidy-diff.py and run-clang-tidy together in simple script, however, I thinke check_clang_tidy.py should be kept as is and as a separate script. It has too much hand tailored logic that only needed for LLVM tests, like outputting test results, interacting with FileCheck, etc.. I can arguably say that nobody will need that apart from people writing their own checks, who I think will do that in LLVM-fork anyway because they would need tools like lit, FileCheck..

With agentic tools nowadays, It’s very easy to create some basic wrappers for running clang-tidy in CI (unless one want fancy caching, etc..).
So I think we first need to unify scripts for everyday local usage (run-clang-tidy, clang-tidy-diff) and later move on to improving analysis time by caching.

I agree with @carlosgalvezp that we probably don’t need to think a lot about quality of CI integration right now.

I would like to point out another GitHub project to run clang tidy, as it takes a unique approach: GitHub - lljbash/clangd-tidy: A faster alternative to clang-tidy

This bypasses the clang-tidy executable and gets the information from clangd via LSP calls.

That project is no longer “as fast” as when it was created, because now we have solved the issue of spending time on system headers for clang-tidy. So maybe it’s not so useful anymore?

I agree some we could do some unification of our internal python scripts.