Last week, I enabled a Python source format checker in GitHub actions so that each PR will be checked for formatting. This only checks the diff that’s being posted (similar to how git-clang-format works).
I have done some exploratory work to create an action that would run git-clang-format on the PR, but this will be a much bigger impact than the Python formatting so I wanted to bring it to the community to ask a few questions:
Should we have a clang-format checker run on each PR? I think we had something similar in Phab, so I think the answer to this one will be yes.
Sholud it be enabled for everything? Are there certain projects not using clang-format and want to opt out?
We shouldn’t run clang-format on tests, would it be enough to exclude */tests/* from the files we want to run it on?
We want to pin the version of clang-format to something, what should the policy for this be?
I was thinking about it, I’m glad someone is already working on it
Ideally we should be able to do better than Phab and have the bot post edit suggestion to GitHub so that the author can just “accept” the fixes.
(We can look then into extending this to clang-tidy checks!)
That only applies to clang tests I suspect, we have regular C++ code in these folders that should be formatter.
In the past we were using “latest LLVM release” as this provides a pre-built clang-format that “everyone” can use.
It might be better to use already implemented GitHub Actions if they meet our requirements. For example, I’m not sure if the mentioned one creates suggestions to actually fix the code. Additionally, we can fork them and customize them to suit our needs.
I’ve seen a GitHub action (but I’m not able to find it right now) which runs clang-format and then sends the format corrections as GitHub review code change suggestions. The author of the PR can then single-click accept and commit the format changes.
Yeah, I don’t think annotations make sense here, it’s a pretty binary thing “run clang format”, and adding annotation for that looks pretty involved and, as you said, noisy.
If it might help, it’s probably to post a comment explaining how to run clang format and re push a commit.
It’s very different from a linter like eg, clang-tidy, or sphinx, that could actually provide useful feedback on specific lines.
My experience from having automated runs of clang-format on every PR (including to an LLVM fork) is it has a poor signal to noise ratio. Part of this is because, like many, our style guide doesn’t describe the way to format a specific piece of code but constraints that formatted code should follow. There’s a whole family of options, such as:
auto x = foo(some_long_arg1,
auto x =
(and is that two or four spaces for a continuation indent of that kind?)
and it’s unhelpful when the tooling is more opinionated that the official rules. When the version changes and it decides it wants to format things differently that’s also a pain, and you require every developer to have that specific version of the tool (e.g. does an LLD developer really need to build Clang in order to have “the” clang-format in use). Then there’s all the masses of code in-tree that doesn’t conform to how clang-format’s heuristics du jour ever thought best. As an experiment I ran clang-format from 1.5 weeks ago (commit fd6619577790 to be precise) on llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp alone and got:
I mostly ignored the clang-format lints in Phabricator. In my experience I found the lints useful for catching people who, for whatever reason, submit code where it’s clear they’ve not tried to format their code according to the style guide or at least approximating the surrounding style (which tends to also be correlated with people who write poor code that needs more reviewer time, in my experience), and so didn’t need to spend my own time trawling the source to find the style violations myself, but for those of us who know how to configure our editors appropriately and/or take care to format our code correctly as a matter of course (which could include just running clang-format locally) I found they got in the way and would give false-positives I would ignore.
Unless we want to change the official project code style rules, what we need is not to run a formatter on PRs but a linter, i.e. check that the code is a valid formatting. People can opt to use clang-format themselves as a way to conform to those rules, but that’s up to them.
I have a very different experience, we’ve been pretty consistent to just rely on the tool in MLIR, and I haven’t checked recently but we’ve been successful at keeping it “clean”.
To me the tool is very helpful and it is an easy arbiter for ambiguity in the guidelines, and any intent to deviate is encoded explicitly with a // clang-format: off.
I haven’t seen this being a pain in practice while using clang-format consistently for >4y. Also Google internally forces clang-format as part of the review and I don’t think I’ve seen an issue with this either over time.
Officially on Phab the presubmit guideline was “latest release” so that no developer has to built clang from source. In practice any of the recent release was unlikely to have a diff in most cases.
You probably want to take another look at your phrasing, I don’t know your intent but that’s coming across as quite disrespectful right now.
This stat alone does not say anything, a very possible conclusion would be that this file is terribly formatted because “following rules manually” is not a reasonable expectation, especially when they are ambiguous. And the result of the tool is just desirable.
If you’re starting from a clang-format’ed code base and your policy is to run clang-format then it’s not so bad (though I suspect newer C++ features can end up being poorly formatted for a while by clang-format, e.g. I’m sure there’s been churn in concepts, possibly still ongoing). But our policy todayisn’t what clang-format produces. If we as a project wish to change that, we can, but I should not have PR automation scream at me because I opt to do things one valid way and clang-format opts to do it the other way. Nor should I have to litter my code with // clang-format: off/on, whose job is not “I want to do things slightly differently because that’s what I happened to write” but rather “clang-format gets confused by this construct (often around macros)” or “manual formatting like this avoids big diffs when new items are added”. I would rather we mandated clang-format than encourage people to fill the codebase with that.
Some of that is desirable as the code currently violates the style. A large proportion of it is just pointless churn, though, where clang-format is overly opinionated and the existing code is totally fine. Churn makes it harder to figure out what actually changed when reviewing code, and as a downstream creates more conflicts and more annoying to resolve conflicts.
To be clear: I don’t propose blocking PRs on clang-format, I just want to make sure we at least have what we had in Phab (and even better depending on what better means in this case). I see no reason why it couldn’t be ignored in GitHub either if that’s what reviewers want to do.
Personally my experiences is much more in line with @mehdi_amini.
Well, it formats the lines you touch. Sometimes that means no reformatting, but if you’re doing some mechanical change throughout the tree then it’s a pain when clang-format interprets that to be “reformat every line you’re tweaking”. For example, do we really want to be reformatting every getPointerTy call when we strip out relics of typed pointers from our APIs? That would be a massive diff made even harder to digest.
This is primarily from my experience working on an LLVM fork where we tried to enforce clang-format on every commit. We have to touch various core parts of LLVM in intrusive ways, and we just found too many cases where clang-format wanted to reformat the code we were touching in different ways to make it useful. We still actually run it, but it’s become mostly noise that I glance at, see if it’s complaining about anything we did wrong and, if not, ignore its suggestions.
A whole lot of the code base is not clang-formatted, so running clang-format all the time of course is going to generate a lot of noise. (There’s a fair amount of the code base that doesn’t even conform to the previous coding standard, i.e. the one before the one where we adopted clang-format. Because formatting churn is worse than actually having a consistently formatted code base, apparently.)
IIRC there is actually a test somewhere in the clang-format suite that clang-formats all the known-compliant files and checks that they didn’t change. So, I have to take claims that clang-format changes its mind all the time with a grain of salt.