RFC: clang-format github action

Hi!

Last week, I enabled a Python source format checker in GitHub actions so that each PR will be checked for formatting. This only checks the diff that’s being posted (similar to how git-clang-format works).

I have done some exploratory work to create an action that would run git-clang-format on the PR, but this will be a much bigger impact than the Python formatting so I wanted to bring it to the community to ask a few questions:

  1. Should we have a clang-format checker run on each PR? I think we had something similar in Phab, so I think the answer to this one will be yes.
  2. Sholud it be enabled for everything? Are there certain projects not using clang-format and want to opt out?
  3. We shouldn’t run clang-format on tests, would it be enough to exclude */tests/* from the files we want to run it on?
  4. We want to pin the version of clang-format to something, what should the policy for this be?

Looking forward to the feedback.

Thanks,
Tobias

7 Likes

I was thinking about it, I’m glad someone is already working on it :slight_smile:

Ideally we should be able to do better than Phab and have the bot post edit suggestion to GitHub so that the author can just “accept” the fixes.
(We can look then into extending this to clang-tidy checks!)

That only applies to clang tests I suspect, we have regular C++ code in these folders that should be formatter.

In the past we were using “latest LLVM release” as this provides a pre-built clang-format that “everyone” can use.

Hey, Tobias! I was thinking the same thing. Also, I think it would be nice to extend it to clang-tidy.

I believe there are some actions that can already be used to check the formatting: GitHub - jidicula/clang-format-action: GitHub Action for clang-format checking.

It might be better to use already implemented GitHub Actions if they meet our requirements. For example, I’m not sure if the mentioned one creates suggestions to actually fix the code. Additionally, we can fork them and customize them to suit our needs.

I’m not sure if you implied /test/* as well, but that where Clang tests live (/clang/test/).

Seems reasonable, if clang-format has a bot that tests for regressions against monorepo.

I’ve seen a GitHub action (but I’m not able to find it right now) which runs clang-format and then sends the format corrections as GitHub review code change suggestions. The author of the PR can then single-click accept and commit the format changes.

I hope the suggestion is for clang-format-diff and not to clang-format the entire file. We routinely reject mixing formatting changes with functional changes.

2 Likes

It is! Of course.

2 Likes

In case this helps, I use following minimal steps to format using clang-format-diff in a project of mine that has been working great so far:

      - uses: actions/checkout@v3
        with:
          # Depth 2 because we need the diff.
          fetch-depth: 2

      - name: Run clang-format on changes
        run: |
          # Below ensures that the exit code of clang-format-diff is properly propagated
          set -o pipefail
          git diff -U0 HEAD^ :^3rdParty '***.c' '***.h' '***.hpp' '***.cpp' \
           | ${{github.workspace}}/util/clang-format-diff.py -p1 -binary $(which clang-format-$LLVM_LINT_VERSION) \
           | tee ${{github.workspace}}/clang-format.patch

      - name: Upload clang-format patch
        if: ${{ failure() }}
        uses: actions/upload-artifact@v3
        with:
          name: clang-format.patch
          path: ${{github.workspace}}/clang-format.patch

This displays the diff in the action in colour and also makes it available as an artifact to download.

If there happens to be some pre-made action implementing this, that would of course be better. I remember not being satisfied with them at the time.

1 Like

Thanks! I have something similar. I like the idea of the downloadable patch. I worry that suggestions in the PR gets noisy - and we need to make sure it works on forks etc. I am still looking into it.

None of the already finished actions I have seen works with git-clang-format. Just on the whole file.

Yeah, I don’t think annotations make sense here, it’s a pretty binary thing “run clang format”, and adding annotation for that looks pretty involved and, as you said, noisy.
If it might help, it’s probably to post a comment explaining how to run clang format and re push a commit.

It’s very different from a linter like eg, clang-tidy, or sphinx, that could actually provide useful feedback on specific lines.

I guess it might get noisy, yes. But I think it would be nice to have a simple command that allows to fix the clang-format from the PR itself
Maybe we can introduce 2 jobs:

  • clang-format-check: runs on every push to a PR and gives a downloadable patch if one wants to use it
  • clang-format-fix: runs with a special command from the PR author and actually commits the fixes. The command might be @llvm-bot fix-clang-format, for example.

My experience from having automated runs of clang-format on every PR (including to an LLVM fork) is it has a poor signal to noise ratio. Part of this is because, like many, our style guide doesn’t describe the way to format a specific piece of code but constraints that formatted code should follow. There’s a whole family of options, such as:

auto x = foo(some_long_arg1,
             some_long_arg2);

vs

auto x =
    foo(some_long_arg1, some_long_arg2);

(and is that two or four spaces for a continuation indent of that kind?)

and it’s unhelpful when the tooling is more opinionated that the official rules. When the version changes and it decides it wants to format things differently that’s also a pain, and you require every developer to have that specific version of the tool (e.g. does an LLD developer really need to build Clang in order to have “the” clang-format in use). Then there’s all the masses of code in-tree that doesn’t conform to how clang-format’s heuristics du jour ever thought best. As an experiment I ran clang-format from 1.5 weeks ago (commit fd6619577790 to be precise) on llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp alone and got:

 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 3133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------------------------------------------------------------------------------------------------
 1 file changed, 1605 insertions(+), 1528 deletions(-)

I mostly ignored the clang-format lints in Phabricator. In my experience I found the lints useful for catching people who, for whatever reason, submit code where it’s clear they’ve not tried to format their code according to the style guide or at least approximating the surrounding style (which tends to also be correlated with people who write poor code that needs more reviewer time, in my experience), and so didn’t need to spend my own time trawling the source to find the style violations myself, but for those of us who know how to configure our editors appropriately and/or take care to format our code correctly as a matter of course (which could include just running clang-format locally) I found they got in the way and would give false-positives I would ignore.

Unless we want to change the official project code style rules, what we need is not to run a formatter on PRs but a linter, i.e. check that the code is a valid formatting. People can opt to use clang-format themselves as a way to conform to those rules, but that’s up to them.

I have a very different experience, we’ve been pretty consistent to just rely on the tool in MLIR, and I haven’t checked recently but we’ve been successful at keeping it “clean”.

To me the tool is very helpful and it is an easy arbiter for ambiguity in the guidelines, and any intent to deviate is encoded explicitly with a // clang-format: off.

I haven’t seen this being a pain in practice while using clang-format consistently for >4y. Also Google internally forces clang-format as part of the review and I don’t think I’ve seen an issue with this either over time.

Officially on Phab the presubmit guideline was “latest release” so that no developer has to built clang from source. In practice any of the recent release was unlikely to have a diff in most cases.

You probably want to take another look at your phrasing, I don’t know your intent but that’s coming across as quite disrespectful right now.

This stat alone does not say anything, a very possible conclusion would be that this file is terribly formatted because “following rules manually” is not a reasonable expectation, especially when they are ambiguous. And the result of the tool is just desirable.

2 Likes

If you’re starting from a clang-format’ed code base and your policy is to run clang-format then it’s not so bad (though I suspect newer C++ features can end up being poorly formatted for a while by clang-format, e.g. I’m sure there’s been churn in concepts, possibly still ongoing). But our policy today isn’t what clang-format produces. If we as a project wish to change that, we can, but I should not have PR automation scream at me because I opt to do things one valid way and clang-format opts to do it the other way. Nor should I have to litter my code with // clang-format: off/on, whose job is not “I want to do things slightly differently because that’s what I happened to write” but rather “clang-format gets confused by this construct (often around macros)” or “manual formatting like this avoids big diffs when new items are added”. I would rather we mandated clang-format than encourage people to fill the codebase with that.

Depends what OS and version you’re on.

You can see the full diff here: https://termbin.com/bf6r

Some of that is desirable as the code currently violates the style. A large proportion of it is just pointless churn, though, where clang-format is overly opinionated and the existing code is totally fine. Churn makes it harder to figure out what actually changed when reviewing code, and as a downstream creates more conflicts and more annoying to resolve conflicts.

To be clear: I don’t propose blocking PRs on clang-format, I just want to make sure we at least have what we had in Phab (and even better depending on what better means in this case). I see no reason why it couldn’t be ignored in GitHub either if that’s what reviewers want to do.

Personally my experiences is much more in line with @mehdi_amini.

1 Like

Using git clang-format is almost the same: nobody suggested to reformat complete files.

1 Like

Well, it formats the lines you touch. Sometimes that means no reformatting, but if you’re doing some mechanical change throughout the tree then it’s a pain when clang-format interprets that to be “reformat every line you’re tweaking”. For example, do we really want to be reformatting every getPointerTy call when we strip out relics of typed pointers from our APIs? That would be a massive diff made even harder to digest.

This is primarily from my experience working on an LLVM fork where we tried to enforce clang-format on every commit. We have to touch various core parts of LLVM in intrusive ways, and we just found too many cases where clang-format wanted to reformat the code we were touching in different ways to make it useful. We still actually run it, but it’s become mostly noise that I glance at, see if it’s complaining about anything we did wrong and, if not, ignore its suggestions.

1 Like

A whole lot of the code base is not clang-formatted, so running clang-format all the time of course is going to generate a lot of noise. (There’s a fair amount of the code base that doesn’t even conform to the previous coding standard, i.e. the one before the one where we adopted clang-format. Because formatting churn is worse than actually having a consistently formatted code base, apparently.)

IIRC there is actually a test somewhere in the clang-format suite that clang-formats all the known-compliant files and checks that they didn’t change. So, I have to take claims that clang-format changes its mind all the time with a grain of salt.

2 Likes

That’s useful information I did not know. So I imagine it would just be limited to new parts of the language then, which are hopefully stabilised by the time LLVM adopts them in its code base?