[RFC] Adding support for clang-format making further code modifying changes

Hi all,

As a frequent user and maintainer of clang-format I would like to formalize a change to the “charter” of what clang-format can/could/should do

Motivation

My thoughts (as one of the people who pushed for this RFC) are that we have a few options:

+some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)

Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

I tend towards the status quo (in general - that’s just my personality) and tend to prefer the fairly firm line clang-format currently has. But I don’t have a great rationale for “why include reordering but not const reordering or dropping braces, etc” - marginally: Changing C++ syntax is more complicated than reordering headers. Despite the fact that reordering headers can have a big impact/breakage, it’s hopefully fairly clear if/when that happens that headers should be made order-agnostic. The subtleties of what might happen when adding/removing braces if the heuristic quick-parsing of clang-format fail seem likely to be more unfortunate, to me at least. But maybe we’re at a point where clang-format is widely adopted/tested enough that that’s less likely to have subtle mis-parse breakages for terribly long? Not sure.

  • Dave

Regarding the 3 possibilities Erich has listed below, my personal opinion here is that option 3 is incredibly dangerous. If the tool is forbidden to break code, then you can be confident that it doesn’t break code (barring bugs). If the tool is allowed to change code such that it potentially breaks code, then at least you know what you’re getting yourself in for. If the tool “almost never” breaks code, then it can lull you into a false sense of security. Unless the tool screams from the rooftops that it might break your code, it’s really easy to assume that it is in fact providing the “does not break code” guarantee. Even if it’s documented, people will make this assumption (just like people constantly add UB to c++ codebases). In practice, you really want to provide 1 or 3 regardless, because it would be nice if the tool didn’t go out of it’s way to break your code, but it shouldn’t advertise that it is 3.

I think that 1 should be the default. It should be documented that, by default, clang-format will not break your code, and if it does that it is a bug. Then, options can be enabled that potentially break code. In effect, it would be Erich’s option 3, but with the caveat that if you do not opt into potentially breaking options, you have option 1. Additionally, BasedOnStyle and similar options that turn several knobs at once should also be guaranteed to be non-breaking. Furthermore, we should turn off existing potentially-breaking settings such as include sorting by default. While this may result in clang-format doing something different between versions with the same config, that ship has also sailed, so I think that’s fine. In the documentation for options, it should clearly state if an option has the potential to break your code.

Thanks,

Christopher Tetreault

+some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)

Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

Commit:
https://github.com/llvm/llvm-project/commit/d89ae9d3accb4cb8b7357932c3dfba8f4ae383c4

Patch discussion:
https://reviews.llvm.org/D11240

It doesn't look like there was any explicit discussion about the
potential for breaking code (but maybe I missed some, and maybe some
discussion happened in a different thread I've not yet found).

~Aaron

+some of the folks involved in the early development of clang-format who might have more context on those design principles/choices made back then (& perhaps on the include reordering change/tradeoff/choices made there)

Anyone got a link to the include reordering change/design discussion for it? Might be informative to see if there was precedent/philosophy clearly stated about how that was acceptable and what principles were applied & could be reapplied to more recent proposals.

Commit:
https://github.com/llvm/llvm-project/commit/d89ae9d3accb4cb8b7357932c3dfba8f4ae383c4

Patch discussion:
https://reviews.llvm.org/D11240

It doesn’t look like there was any explicit discussion about the
potential for breaking code (but maybe I missed some, and maybe some
discussion happened in a different thread I’ve not yet found).

Thanks for looking! Yeah, maybe it’s so old that the dev was so small that those issues might’ve been discussed in an offline discussion in person between Manuel and Daniel. Ah well.

I’d just offer a few thoughts:

  • Whether or not to adopt clang-format for a project is already quite controversial in some places. I think if we made it so that it also made more extensive non-whitespace changes, I think it would reduce its adoption, even if they were off by default.

  • Non-whitespace changes are very dangerous. A bug in clang-format could more easily create bugs in the code it formats (at least that is the perception). For some risk-averse projects, this would be an unacceptable risk.

  • Perhaps an alternative idea would be to create a new program called clang-reshape or something, that shared a common underlying framework with clang-format, but clang-reshape contains the options for both the non-whitespace changes and whitespace-changes, whereas clang-format only has the whitespace changes. That way folks could adopt clang-format and ban clang-reshape if they only wanted whitespace changes.

HTH. -Andrew

I’d just offer a few thoughts:

  • Whether or not to adopt clang-format for a project is already quite controversial in some places. I think if we made it so that it also made more extensive non-whitespace changes, I think it would reduce its adoption, even if they were off by default.

Yeah? I wouldn’t’ve figured that.

  • Non-whitespace changes are very dangerous. A bug in clang-format could more easily create bugs in the code it formats (at least that is the perception). For some risk-averse projects, this would be an unacceptable risk.

  • Perhaps an alternative idea would be to create a new program called clang-reshape or something, that shared a common underlying framework with clang-format, but clang-reshape contains the options for both the non-whitespace changes and whitespace-changes, whereas clang-format only has the whitespace changes. That way folks could adopt clang-format and ban clang-reshape if they only wanted whitespace changes.

This crossed my mind too - but I expect a lot of the clang-format integration may be more invasive than that. (hardcoded binary names, use of clang-format as a library, etc) - adding new features to the existing adopted clang-format may be especially valuable compared to providing a new tool. (that said, I do also feel like grouping features in an existing tool only because of its existing adoption is super great either)

  • Dave

Aren’t there two separate discussions here:

  1. What is the set of features in clang-format?

  2. What is the subset of those features used in the LLVM (and other “major”) coding styles?

For me personally, the answers to those questions are quite different.

For (2), I feel that static analysis tools which aim for broad adoption must keep false positives to a minimum, otherwise the tool will just end up being ignored. clang-format is arguably a static analysis tool, albeit a very simple-minded one. So I wouldn’t want to run the risk of false positives in the default LLVM style.

For (1), if some users really want to have such features for their own, non-LLVM coding style, then I’d say it’s perfectly fine for clang-format to include those features as long as the majority of users aren’t affected directly or indirectly. Indirect effects could be caused e.g. by losing some performance (“save speed” is a valuable feature) or by making clang-format harder to maintain.

Cheers,
Nicolai

I’m cautiously +1 on const reordering, having previously opposed it and been convinced.
I think anyone who’s worked on a large shared codebase both before and after clang-format can understand the value here, so I’ll focus mostly on the risks and why I think they’re acceptable.

Risk: clang-format will become a grab-bag of features with no clear line - just anything implemented on top of its pseudo-AST.
Clang-format’s brand is low-level formatting details and I think it’s important to preserve this. Const order fits here in users’ minds. (So does brace addition/removal).

Risk: The feature will break code and clang-format will no longer be (seen as) reliable. This can make it harder socially or technically to deploy, and cause real damage.
I think we need to work hard on mitigating this:

  • the feature needs careful design and extra scrutiny, like security-critical code
  • it should be clearly and temporarily marked as experimental, with opt-in required
  • it should be clearly and permanently marked as “makes assumptions about coding style”, with opt-in required.
  • bugs need to be thoughtfully addressed
    From what I can see MyDeveloperDay is serious about doing all of this.

Risk: clang-format will be overtaken by the complexity of such features, which will outweigh the benefits (if few use them), hurting maintenance, causing bugs etc.
However this isn’t different from other optional features. Editing tokens tends to be done as a separate pass which is relatively easy to isolate (compared to something like supporting a new language). With complexity isolated, this is mostly just about how maintainers prioritize their time/attention, which must be left up to them.

Regarding include-ordering: I think this is a valuable feature if you follow a coding style that allows it to be correct, and it fits well in clang-format’s brand. However it wasn’t clearly labeled to emphasize its caveats, and in hindsight it shouldn’t have been made part of the Google style without further opt-in required.

Cheers, Sam

Thanks for the write up! & for the record I’m pretty comfortable with what Sam’s said here. (not that I’ve got strong weight in clang-format development, but to make sure my other comments on the thread aren’t treated as standing criticisms/concerns)

  • Dave

Thanks for the response Sam,

Here is how I see we mitigate the risk:

I’m cautiously +1 on const reordering, having previously opposed it and been convinced.
I think anyone who’s worked on a large shared codebase both before and after clang-format can understand the value here, so I’ll focus mostly on the risks and why I think they’re acceptable.

Risk: clang-format will become a grab-bag of features with no clear line - just anything implemented on top of its pseudo-AST.
Clang-format’s brand is low-level formatting details and I think it’s important to preserve this. Const order fits here in users’ minds. (So does brace addition/removal).

I doubt we wouldn’t continue to apply the same level of scrutiny on the code reviews and expect them to follow best practices and guidelines, I am expecting us to still be quite circumspect as to what we’d consider.

To be honest clang-format I think already runs at quite a high review rejection rate, people ask for all sorts of things and we do try to push back pretty hard, landing something can sometimes be pretty torturous to get through review,
I’m not expecting that to change.

Risk: The feature will break code and clang-format will no longer be (seen as) reliable. This can make it harder socially or technically to deploy, and cause real damage.
I think we need to work hard on mitigating this:

  • the feature needs careful design and extra scrutiny, like security-critical code
  • it should be clearly and temporarily marked as experimental, with opt-in required
  • it should be clearly and permanently marked as “makes assumptions about coding style”, with opt-in required.
  • bugs need to be thoughtfully addressed
    From what I can see MyDeveloperDay is serious about doing all of this.

I am, I also think that we shouldn’t plough on with individual changes if we see them as potentially ambiguous, I would rather ignore a change if in doubt, I don’t feel such features need to be 100% catch all (like how sometimes clang doesn’t always tell you about all missing overrides, just as it can rationalize them), This may require more specific options to ensure we know what an tok::identifier actually is in order to avoid ambiguities caused by macros (a little like StatementMacros)

Risk: clang-format will be overtaken by the complexity of such features, which will outweigh the benefits (if few use them), hurting maintenance, causing bugs etc.
However this isn’t different from other optional features. Editing tokens tends to be done as a separate pass which is relatively easy to isolate (compared to something like supporting a new language). With complexity isolated, this is mostly just about how maintainers prioritize their time/attention, which must be left up to them.

To be honest these are likely some of the less invasive features (in comparison to say adding something like adding Whitesmiths style or C#), as you say the “Passes” give us an easy mechanisms to handle the “OptIn” without adding "if (…) everywhere and the passes also tend to be very self contained especially as the Formatting itself is just a Pass in its own right which is performed later.

I have no concerns over the maintenance other than ensuring we understand how new passes actually work, but the compartmentalization feels on a par to compartmentalization of individual clang-tidy checks.

Regarding include-ordering: I think this is a valuable feature if you follow a coding style that allows it to be correct, and it fits well in clang-format’s brand. However it wasn’t clearly labeled to emphasize its caveats, and in hindsight it shouldn’t have been made part of the Google style without further opt-in required.

To be honest as a developer I like the brutality of include-ordering, breaking my code only tells me it isn’t robust enough (likely missing forward declarations or not including what its using)

The handling of defaults is always difficult as some people want things and others don’t, (hence the need for the RFC), but I’ve always been clear this needs to be “Opt-In” from the start. For the majority of developers I would expect them to continue to use clang-format as a code formatter and nothing else, but having a ability to make some (not all) obvious changes could potentially be a great help to improving code

For example how many times do you see in LLVM the review comment that says “elide the braces” for

if (x) {
return;
}

I feel this is something that clang-format could be made to easily handle. This RFC is about gaining a general consensus to let us try. We feel we can add even more value.

Anyone who knows me, knows I’m very much pro “clang-format all the things”

MyDeveloperDay

I'm all in favor of allowing such changes and will help to create and review these changes.

Kind regards,
Björn (HazardyKnusperkeks).

I agree with most of the sentiment in this thread, too. clang-format can be dangerous but it can also be so much more.

I have been using clang-format on editors, pre-commit checks and ninja targets for a quite a while and I think it’s a fundamental development tool. With it, style comments stop being a personal conversation and become tooling.

I would like to see a world where there are no more long styling documents with personal angst in all projects, just a clang-format config file.

I’m in favour of increasing the potency of clang-format. Quick checks can be done on save, more expensive ones as ninja targets and even more expensive ones as CI.

To me, the discussion of defaults and perils must be clear to all users and documentation may not be the best way: I’ve never read the clang-format documentation.

Another problem of clang-format is that there are so many options but not many people know enough of them.

Yet another problem of clang-format is that its behaviour is different for the same options over different versions. We’re stuck on clang-format-9, not because it’s “the best” but because re-formatting the whole tree every time a new tool comes out is silly.

So a good solution to all of these problems is to generate a default configuration file with all the options and comments on each one of them, including “this changes non-whitespace code”, “this is quite slow”, “if this breaks code, here’s how to fix”, etc, and setting all of them to the default value.

We also nee to be able to update the config file with new options without destroying the old (often modified) ones, and make sure new versions only do new things if the config is different.

I know, configuration management and backwards compatibility aren’t trivial, but managing clang-format over time isn’t either, and it should. Well, it should, if it wants to be a catch-all tool to do all the things.

cheers,

–renato

I feel this is something that clang-format could be made to easily handle. This RFC is about gaining a general consensus to let us try. We feel we can add even more value.

Anyone who knows me, knows I’m very much pro “clang-format all the things”

I agree with most of the sentiment in this thread, too. clang-format can be dangerous but it can also be so much more.

I have been using clang-format on editors, pre-commit checks and ninja targets for a quite a while and I think it’s a fundamental development tool. With it, style comments stop being a personal conversation and become tooling.

I would like to see a world where there are no more long styling documents with personal angst in all projects, just a clang-format config file.

I’m in favour of increasing the potency of clang-format. Quick checks can be done on save, more expensive ones as ninja targets and even more expensive ones as CI.

To me, the discussion of defaults and perils must be clear to all users and documentation may not be the best way: I’ve never read the clang-format documentation.

Another problem of clang-format is that there are so many options but not many people know enough of them.

Yet another problem of clang-format is that its behaviour is different for the same options over different versions. We’re stuck on clang-format-9, not because it’s “the best” but because re-formatting the whole tree every time a new tool comes out is silly.

So a good solution to all of these problems is to generate a default configuration file with all the options and comments on each one of them, including “this changes non-whitespace code”, “this is quite slow”, “if this breaks code, here’s how to fix”, etc, and setting all of them to the default value.

We also nee to be able to update the config file with new options without destroying the old (often modified) ones, and make sure new versions only do new things if the config is different.

I know, configuration management and backwards compatibility aren’t trivial, but managing clang-format over time isn’t either, and it should. Well, it should, if it wants to be a catch-all tool to do all the things.

clang-format is designed to run on changed lines, not to have all code be compliant. You can use it for the latter, but I think it’s a pretty fundamental difference in development goals.

The problem I've seen is that the opinion of code formatted with 3 lines
of context can differ from 5 lines of context which can be still
different again given the entire file. The only reliable way we've
gotten it to work is to format the whole file every time. But we require
well-formatted code via CI processes, so that's fine (if one can afford
the "rewrite the world" every time the tool gets updated…hence why some
projects are still on version 3.8(!), others 6, some even made it to 9).

--Ben

The underlying design idea for clang-format is to give it the whole file as context, but only format the changed lines.
clang-format supports that use case very well, and it can even be built into presubmit checks.

Fwiw, I think “clang-format can make breaking changes to code when we consider the benefit to be worth it” has been the policy on clang-format for a very long time, so accepting that as the official policy is IMO not a change. If somebody wants to write it down to prevent future revisiting, that seems fine with me.

IIRC, Phab deals with "patches" that have infinite context, but that's
not the norm out there. Maybe that's where this assumption comes from?

Anyways, the main issue I've seen is that developers have versions X, Y,
and Z laying around locally depending on their distro or whatever. So
developers' machines "fight" with CI and each other over the preferred
setup. This is why I don't use any formatting-on-save features myself
with any tool: it's CI's job to enforce it, so why am I wasting my time
on pre-commit hooks that aren't even necessarily accurate? It also makes
teasing out my change with `git add -p` that much harder. So instead, I
just say "ignore formatting if you don't have the exact right version
laying around; CI will handle it". The tool we use can rewrite the
entire topic to ensure it is well-formatted at each step on demand. In
fact, we update `clang-format` versions using this mechanism: tell it to
use a new version, update the config as necessary, and then tell it to
format the entire tree using the new setup.

--Ben

Thank you for putting this RFC together! Given how important
clang-format is to various people's workflows and the *massive* user
base for the tool (who largely aren't represented here on the mailing
lists), I think it's good that we are having a broader discussion
about tool expectations for clang-format.

I come down on the side of not wanting my code formatting tool to be
opinionated about the input source code I feed it. For my personal
needs, I need to be able to trust my formatting tool won't modify the
meaning of my code. If it modifies the meaning in a way that gets
compile errors, that's bad (it means I'm having to work around a
helper tool in order to pass my CI pipeline, which is highly
frustrating). I know that clang-format already happily breaks code
with include sorting, and I think it was a mistake to add that
functionality, especially given the lack of discussion about breaking
code during the original review of the feature. Changes that can
*silently* change the meaning of my code are an even worse concern
because of the increased risk of not noticing the semantic change.
(Keep in mind that some workflows will format an entire file when
saving the file, so there may not even be direct user interaction
involved in the formatting.)

Putting potentially breaking changes behind configuration flags is a
solution, but not one that I'm all that excited by. There are already
*a lot* of configuration options for clang-format, and having to
remember which ones may destroy my input source is a burden. We could
use a naming convention to clearly identify all of these options, but
the first time we have a configuration option that doesn't follow the
naming convention, we're back to the "I can't know which options are
high-risk" problem. Also, the fact that configuration options can be
impacted by non-local configuration files elsewhere in the file system
hierarchy can cause surprising configuration option behavior (I know
we've run into the situation before where an inherited config option
caused a bit of head-scratching as to where the option was set).

I think the idea of a separate tool that's built on top of
clang-format (consuming clang-format as a library with additional
formatting features) has the most appeal to me. Then we no longer have
to worry about the config options being the gate -- the tool is the
deciding factor as to whether you want to opt into danger mode or not.
One of the architectural benefits of designing a series of composable
libraries is the ability to compose them into more powerful tools, I
think we should take advantage of that. This also provides a migration
path for more experimental functionality -- it can be implemented in
the more dangerous tool, shake out the bugs from there, and then be
shuffled into the safer tool if/when the issues have been worked out.

Experience has taught me that just about the worst thing a tool can do
is break user trust, and once you break that trust, you almost never
get it back again. clang-format is a fantastic tool, but the more
opinionated it becomes about source input, the less people are able to
use it (by definition).

~Aaron