RFC: clang-format github action

I’ll also leave the following two quotes from the introduction of our current coding standards:

While this document may provide guidance for some mechanical formatting issues, whitespace, or other “microscopic details”, these are not fixed standards. Always follow the golden rule:

If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.

There are some conventions that are not uniformly followed in the code base (e.g. the naming convention). This is because they are relatively new, and a lot of code was written before they were put in place. Our long term goal is for the entire codebase to follow the convention, but we explicitly do not want patches that do large-scale reformatting of existing code. On the other hand, it is reasonable to rename the methods of a class if you’re about to change it in some other way. Please commit such changes separately to make code review easier.

The final sentence is particularly relevant to running git clang-format on commits.

1 Like

This is how I’ve been doing every single refactoring I’ve done in LLVM, like… forever?
I’ve been using clang-format consistently, even in parts of the codebase not compliant (by formatting only the diff). I don’t quite get the “pain” aspect of it actually.

Downstreams like us, for one. Every reformatting of code we’ve changed downstream means more tedious merge conflict resolution. It’s much easier when it’s immediately obvious that something got added/removed/renamed because the formatting didn’t change than when the whole line got reflowed and you have to manually match everything up. As a long-standing (10ish years) fork of LLVM with a 2.5k line diff to llvm/lib/CodeGen alone we feel a lot of pain and friction when things like that happen.

I don’t think so, on the contrary: clang-formatting the lines you’re touching is aligned with “Our long term goal is for the entire codebase to follow the convention”.

The last sentence follows:

On the other hand, it is reasonable to rename the methods of a class if you’re about to change it in some other way. Please commit such changes separately to make code review easier.

This is different from clang-format-diff in 2 key aspects: 1) this goes beyond formatting, 2) and this goes beyond the immediate lines clang-format-diff would just format, this is about renaming APIs! That can spans multiple files, so it’s all about keeping your patch as small as possible. Clang-format does not have this effect.
Note that convention goes beyond format, a lot of it is naming and the various casing conventions.

Note that clang and libcxx already have / had that automation on buildkite as part of the pre-commit CI.

So this is probably something that we will want to port over to GitHub just as to get better UI @ldionne @AaronBallman
We do expect the formatting check to be green in clang before merging PRs

And yes, the policy in clang is to not format test files which are sensitive to things being on a specific lines and clang-format would be confused by the way we torture the parser,
but that is something clang-format picks up automatically based on the nearest .clang-format file

2 Likes

To try to summarize:

  1. Certain subprojects (libc++, clang) already use clang-format heavily and want to retain that functionality in Github Actions.
  2. Some projects and reviewers don’t want to block PRs if they don’t pass formatting.
  3. Some projects might want to opt-out completely from formatting.
  4. A local .clang-format file excludes tests, so there is no need to ban them in the action file.
  5. We should only run on diffs
  6. Annotations/suggestions might be helpful, but also too noisy, and there are some permission problems around this.

Here is my updated suggestion:

  1. I will create an action that runs git-clang-format on the diff of the PR.
  2. This will give a WARNING, not an error, if it fails
  3. Let’s start by not adding annotations or suggestions to get used to it.
  4. Make it easy for a subproject to opt out if they don’t want this run on their files.

And make that version 1. Future enhancements could be:

  • Post a comment on the PR explaining how to use git-clang-format locally to format the changes if it fails
  • Post the diff of format changes so that it can either easily be patched locally or via an action
  • Experiment with suggestions
  • Certain subprojects might want to block PRs based on this failing, but I don’t think we have any automatic blocking policies yet, so I think that needs a community-wide discussion.

Anything I missed?

4 Likes

FYI, this is how we check clang-format from the buildkite pipelines of clang and libc++ right now:

If you create a Github action to check clang-format, we’ll be able to remove these jobs from the BuildKite pipeline. I think it makes sense to gradually move stuff over from BuildKite and into GH Actions since it will simplify our infrastructure, however we have to make sure that we retain the same functionality as we do that. In particular, you should take note of the special behavior for files where we ignore formatting in libc++.

Basically what we do is that we enforce that all new files are clang-formatted, and we enforce that some existing files are clang-formatted, but not all of them. We use the job to prevent backsliding.

1 Like

Just in case it might be useful, we have clang-format check for our compiler in the Intel’s fork.
Link to the local action - llvm/devops/actions/clang-format/action.yml at sycl · intel/llvm (github.com).
Link to the pre-commit workflow invoking clang-format action - llvm/.github/workflows/sycl_precommit_linux.yml at sycl · intel/llvm.

1 Like

Sholud it be enabled for everything?

Not for TableGen (i.e. *.td) files.

I have spent some time on this during the weekend, and I have a suggestion.

I figured out we can post the formatted diff as a summary comment in the PR. See an example here: Break clang-format by tru · Pull Request #6 · tru/llvm-project · GitHub

This comment will be updated as you push new commits and delete if there are no more formatting errors.

In order to get this working correctly, I would need to move the logic to a Python script and I suggest we do a generic format-code-helper.py that can run clang-format, black or other formatters that produce a diff and then post/update it. If we go this route, we can change it to post annotations/suggestions later on instead of just a comment if we feel like that’s a better idea.

Before I start working on this, I wanted to check if we want this. To me, it makes the formatting error more visible and we can guide how to fix stuff instead of digging into the action logs or having our PR flooded with suggestions.

Thoughts?

3 Likes

This sound excellent to me!

I wonder if keeping a comment saying that all formatting errors have been fixed isn’t better feedback though.
Just to make sure i understand, when you say "generic format-code-helper.py that would still be called by different jobs, right?

Good point. We can keep the comment just saying :+1: or something.

Yes generic I mean we would have something like:

format.py --pr 123 --black <pyfiles> for py files and similar format.py --pr 123 --clangformat <cfiles>

1 Like

PR for the new integration here: [Workflow] Add new code format helper. by tru · Pull Request #66684 · llvm/llvm-project · GitHub

I’d like to mention that in the past, specifically the clang-format section has used unreleased versions of clang-format (since you’d need to build it anyway to work on it). I’m not sure if this will be a large issue, though.