[RFC] Clang-formatting all of libc++ once and for all

Hi,

Libc++ has been using clang-format for 1-2 years. We’ve been enforcing that all new code written is clang-format clean by various means, the most recent incarnation being an LLVM-wide Github Action that enforces clang-format on PRs.

However, we have a good amount of existing code that isn’t clang-format clean. The idea was that we would gradually clang-format more and more code as we’re making changes to the code base, and eventually we’d be clang-format clean. We also previously maintained a list of unformatted files for which we would not enforce clang-format to reduce churn when touching these files. This ended up creating a lot of confusion for contributors, who would often be asked to update the list of unformatted files when it changed. Because of that, we recently removed this exclude list.

As a result, any patch touching any part of the code now has to be clang-format clean in the vicinity of the patch, even in a file that is otherwise not clang-formatted. This is more straightforward than the previous situation, but it’s still a pretty big annoyance and I expect it to start creating confusion right away for contributors.

Therefore, I propose clang-formatting the whole code base once and for all so we can be done with it. As clang-format evolves, we’ll still run into conflicts in the future but those will be significantly smaller and less confusing than the current situation. 99% of the time, we’ll be touching code that is already formatted properly.

There are mainly three things to keep in mind when thinking about such a change:

  1. Contributors who have outstanding patches and will eventually need to rebase them across the clang-format change. For all intents and purposes, this is a very small number of core contributors.
  2. Downstreams who have internal diffs and who will need to merge their differences across the clang-format change.
  3. The ability to git-blame the library.

I actually believe that clang-formatting the whole codebase in one go is the best course of action to address all of these concerns. Indeed, the alternative is to incrementally clang-format the code base like we’ve been doing, but this has worse effects in the long run on the above points.

First, doing it in one go reduces the overall amount of work that must be done by contributors and downstreams compared to incremental formatting. We sit down once, do it, and then never think about it again. There can be no additional formatting-related conflicts in the future.

For rebasing older patches across the formatting change, it is actually possible to automate the process using a merge driver, so that rebases work 100% transparently. The programmer only needs to install the merge driver by running a git command once. I implemented a script that does that based on Nico’s notes for Chromium and tested it with our code base, and it is able to rebase patches across clang-format without any intervention. This also applies to downstreams that have an internal diff.

Furthermore, doing it at once allows the clang-format commit to be added to .git-blame-ignore-revs, which is the best tool available to keep a sensible git blame. The alternative of incrementally clang-formatting the code leads to patches touching lines that would otherwise be untouched if not for clang-format, which breaks git blame pretty badly.

For all these reasons, I’d like to make this change once and for all. It will simplify the day-to-day operation of code reviews and making contributions, which is currently pretty annoying for something that we shouldn’t be thinking so much about.


Current status

Week of Nov 27: Propose an optional change to also rename _LIBCPP_INLINE_VISIBILITY to _LIBCPP_HIDE_FROM_ABI and _VSTD:: to std:: prior to the formatting change, since we could kill two birds with one stone. This is optional and I don’t want to necessarily couple the two changes, but I think that this additional change is simple enough that it should be pretty uncontroversial. _LIBCPP_INLINE_VISIBILITY change / _VSTD change
Week of Nov 27: Put out for review the merge driver script that would allow transparently rebasing across clang-format changes. PR HERE
Dec 4th: Open a PR that clang-formats the whole code base. We will iterate on that PR, extracting bits where we need clang-format off/on out of the PR until we’re happy. PR HERE
Dec 11th (goal): Land the clang-format patch assuming we’re ready. This date might slip depending on what we find in the previous step.

11 Likes

I fully support clang-formating the whole codebase in one go. Current approach works only with small patches or big patches limited to one area in the code.

I consider it problematic. I’m working on a big patch and clang-format makes small changes in a few functions into very big diff: I added two lines to a function, but clang-format decided to fix the whole function and it’s hard to see in the diff, what is the meaningful difference.

Already formatted code would do the review and maintenance (rebases) much easier.

It may be good to do it again, when formatting rules are updated.

Let’s announce the date in advance to allow all nearly completed, big patches be merged before the formatting happens.

1 Like

Even though I’m not a libc++ contributor, I support the direction, and is interested to see how this turns out for the project.

I have some conflicting feelings about “once and for all” part of the title, because clang-format and hard-coded LLVM style are evolving, too, and you haven’t touched on how libc++ is going to deal with it. If it hasn’t been done already, setting up a regression test for clang-format seems useful to prevent clang-format changes to make formatting worse (or at least make it visible before said changes are merged). But even with regression testing set up, there still should be strategy for dealing with changes that come from clang-format. [clang-format] Change LLVM style to BreakAfterAttributes == ABS_Leave by Endilll · Pull Request #70360 · llvm/llvm-project · GitHub might serve as an example of such change.

1 Like

In general I’m not against the idea. I would prefer to have a date communicated up front. I haven’t tested Nico’s tool, did you test how it works when existing patches change the formatting too?

In my personal experience the rules of clang-format seem quite stable, except for bleeding edge C++ features. For example concepts had their formatting adjusted a few times. (Basically it didn’t know about them in the past and learned about them.) So I’m not too afraid new versions of clang-format will suddenly reformat a large part of our code base.

I think it would make sense to remove some other tech debt we have. _VSTDstd, _LIBCPP_INLINE_VISIBILITY_LIBCPP_HIDE_FROM_ABI, directly come to mind, there might be more. This is a similar source of confusion for new contributors and often leads to review comments.

1 Like

As a downstream, I wholeheartedly agree.

2 Likes

As an occasional contributor: yes please, and as soon as possible!

Most code-formatting issues don’t really bother me much, but I find it quite irritating and annoying to deal with the way that many of libc++'s files have inconsistent indent widths between different parts inside the same file.

Agreed; requiring code changes to be clang-format clean while the existing old code doesn’t even agree on a standard indent-width is not a great situation to be in, since it’ll certainly result in massive changes like you indicate.

Doing the codebase-wide reformatting ASAP would be a great solution to that problem. :slight_smile:

2 Likes

I have never been a fan of the “let’s change the rules but not the existing code” attitude. Big bang, boom done.

The git-blame aspect is minor, IMO. While I very rarely do anything with libc++, I do git-blames a fair amount, and I usually have to iterate to find the relevant commit anyway. Passing through one layer of reformatting isn’t a big deal.

This has come up before and I believe someone maintains a list of clang-format-clean files (or possibly the inverse), and runs a test clang-formatting everything and making sure the clean files don’t change with a new clang-format version.

2 Likes

If we add the commit to .git-blame-ignore-revs then there’s no indirection needed (unless things get re-wrapped then presumably things might get a bit confused?). But things like indentation should be reliably skipped over.

2 Likes

For those who are blown away by this “new” feature, there’s more here: Git - git-blame Documentation

1 Like

Unlike something like .gitignore or .gitattributes though, the name .git-blame-ignore-revs is not built into git, and the file is not applied automatically? The file in the monorepo root contains these instructions:

# You can set this file as a default ignore file for blame by running
# the following command.
#
# $ git config blame.ignoreRevsFile .git-blame-ignore-revs

Regarding the original proposal, as a downstream maintainer of libc++ extensions, I also prefer formatting everything at once. As a contributor to some other parts of LLVM, I also wish they would reformat the entire subtree at once, once they start enforcing formatting rules for patches.

1 Like

I fully support it. There are so many times in the code review we say:

  • Could you please not format the entire file in order to let us see the actual change?
  • Could you please format the section you have added?
1 Like

Agreed. However, I will caution against doing this right now, as there are certain bugs in clang-format that we’ll need to deal with because they hurt readability quite a lot. Perhaps it’s worth prioritising addressing these bugs as a matter of libc++?

Does there need to be a discussion about the contents of our format style, or are we rolling with what’s already there?

We should also adopt a policy on how often we want to run clang-format project-wide. I raise this because clang-format frequently adds new features, and it may occasionally be beneficial to adopt some. Sometimes the changes are small, but other times they can be far-reaching.

1 Like

Once per release cycle seems ok for downstreams. More than that would start to get annoying to deal with, though I guess they’d generally be relatively limited in size at least.

1 Like

Those are good points. First, we would pin down the canonical clang-format version to be the same as the one used in the clang-format CI job here. Currently that is clang-format 17.0.1. This way, our code base as a whole would be consistent with the version of clang-format we’re using to enforce formatting in the CI, which is important to avoid confusion.

Then, as @jrtc27 suggested, I think we could update this Clang-format version once per release cycle and reformat all of libc++ with it. I do expect that most code would remain unchanged so doing that would probably be pretty smooth. If we notice a more major change that would be really disruptive, I think we could just handle when/if it happens.

The monorepo as a whole does have a clang-format CI job that ensures that all newly checked-in code conforms to clang-format, so I think we’re good on that point!

I would suggest going with what we have. Large parts of our code are already formatted with that style, it’s been working pretty well and I haven’t seen many complaints about it. There were some concepts-related issues initially but those got resolved in recent clang-format versions. So basically, since nobody’s complaining I think we should just keep what we have.

I am not aware of any serious bugs in clang-format that have been causing issues with our formatting. Or perhaps there are but we have just learned to live with them :slight_smile: In all cases, we are enforcing clang-format for new code regardless, so I think we should definitely avoid blocking this reformatting on clang-format bugs. Instead we should just pick up fixes for said bugs in future re-formattings and use clang-format off/on in places that really don’t make sense for now.


Overall, it looks like people are quite favorable to the idea presented here. I am going to wait a few days and then follow this timeline:

  • This week: Propose an optional change to also rename _LIBCPP_INLINE_VISIBILITY to _LIBCPP_HIDE_FROM_ABI and _VSTD:: to std:: prior to the formatting change, since we could kill two birds with one stone. This is optional and I don’t want to necessarily couple the two changes, but I think that this additional change is simple enough that it should be pretty uncontroversial.
  • This week: Put out for review the merge driver script that would allow transparently rebasing across clang-format changes. PR HERE
  • Dec 4th: Open a PR that clang-formats the whole code base. We will iterate on that PR, extracting bits where we need clang-format off/on out of the PR until we’re happy.
  • Dec 11th (goal): Land the clang-format patch assuming we’re ready. This date might slip depending on what we find in the previous step.

This is a fairly aggressive timeline, however in practice there’s not much technical work remaining and the more we wait, the longer we have to deal with creeping clang-format changes in PRs, so I think it makes sense to avoid delaying this more than needed.

1 Like

I’m also very much in favour of this. It makes both reviewing and writing code a lot simpler and I don’t see much of a downside.

I’m strongly in favor of doing this. I think our current arrangement (maintaining a list of files to which formatting doesn’t apply, minimizing formatting-related diffs, having different formatting styles in the same file, etc.) at best negates the benefits of using automated formatting. IMO a “big bang” reformatting, while not without its cost, is the lesser evil.

First and foremost, lets fix different indent sizes TODAY.

Lets do that completely, across the codebase, right away. Differing indent sizes is a problem for developers today, and it is the biggest cause of pain points (A) and (B).

I really don’t care to force a particular format over another. As Chris mentioned, the actual output of clang-format changes over time.

Code Formatting Goals

Code formatting is a developer experience problem, and we should treat it as such.
It is not a correctness problem that requires strict validation, and we should not treat it as such.

The things I want from clang-format are:

  1. Edit-ability - Your editor might not be set up the same as mine. If I don’t want to fight my editor, we need a way to produce consistent output that we all find roughly acceptable. The biggest issues right now is our inconsistent use of indent.

  2. Consistency - It eases readability and editability. But being consistently bad is not good. So it has to be a consistency that’s better.

  3. Developer velocity - Tooling allows developers to spend their time on more important things. Every second a developer spends on code-formatting is lost productivity.

Code Formatting Issues

The biggest issues I see are:

(A) Human’s format code better – when they spend time doing it and our consistent.

Littering the code with `// clang-format off` leaves us worse off.

(B) We don’t want to force humans to spend time on code formatting.

This goes both ways, we need a tool to make the process automatic, but we can go too far too. Overzealous formatting requirements – as we have had previously – and force developers to spend too much time trying to get in compliance.

(C) - Our concerns about enforcement, but they should be about allowing developers to produce readable, consistent, code quickly.

Other open-source projects do this much better. They have

  • Build rules and scripts to install/setup everything needed to format the code.

  • pre-commit hooks you can install locally to do the formatting, and to ensure it’s done before correctly before you push.

These open-source projects focus on the developer experience, we’re focusing on the whitespace. Our experience is still very painfu, and I’m not OK requiring fully formatted code until we’ve solved the issues with producing that code.

Proceeding with a Developer First Mindset

Before we talk more about enforcing formatting, we need to have:

  1. A clear and documented and push-button solution to formatting code pre-review.
  2. Documentation and testing for our formatting solution.
  3. To decide how we’ll manage clang-format format changes. Does everyone have to be on the exact same revision of Clang-tidy? Do we support two releases?

A word on iterative formatting

Iterative formatting, (i.e. using git-clang-format) addresses all of my cocecrns above.

  1. It allows developers to use the formatter when it helps, and ignore it when it hurts.
  2. It creates fewer //clang-format comments in our code.
  3. It means we don’t need to require 100% formatted code. And then 100% reformat it.
  4. It allows multiple versions of clang-format to be used at any given time.

As mentioned up front, iterative formatting requires some consistency in the code base (we can’t mix 2 vs 4 space indents). So we should address those.

A Path Forward

We should

  1. Reformat the code so it’s consistently indented.

  2. Create a build rule or script that will

    • (A) Install the correct clang-format binary

    • (B) Install git-clang-format

    • (C) Modify the users gitconfig settings for git-clang-format to correctly handle libc++'s extension-less files (there’s already a setting for this).

    • (D) Install pre-commit hooks which check the formatting locally, and offer to format the code if needed.

    • (E) Extensively document how to format the code.

This is what the Textual python library does, and when I had to send a PR, it was trivial to meet their formatting requirements. It ensured the code was correctly formatted before I even opened the pull request.

A note on future language features & formatting

Consider concepts. clang-format is really bad at formatting concepts. So bad, that we just about always need to turn it off.
Are we going to turn it off for all blocks of concepts? Are we OK having the tool, which lags behind the language, define how we have to format new language features?

How are we going to keep the // clang-format off blocks from decaying? When will we know when to turn them back on?

Again, I find we’re focusing too much on enforceability and not enough on our developers.

Non-considerations

I understand some developers choose to format entire files, rather than just the code they have changed. This may have been easier in the past, or still easier than changing their workflow to use git-clang-format. However, we shouldn’t give this much weight when considering what is best for the project as a whole. And we shouldn’t ask them to change until after we have a rock-solid iterative formatting story for everyone.

However, there is no need to do this, and so we shouldn’t give it weight.

1 Like

This RFC isn’t about enforcing clang-formatting, it’s about clang-formatting the code to have a consistent style.

I think that this timeline is great. Do we want to also add configuration for --ignore-rev in git blame using an updated local git config modified by a command (like)

git config --local blame.ignoreRevsFile .git-blame-ignore-revs

I’d be happy to add a PR for something like this if you think that it would be helpful!

Thank you, @ldionne, for taking on this helpful task!
Will

Thanks a lot for sharing your thoughts Eric. I agree with a lot of what you said and I think there’s a indeed a few small things we can do that will go a long way towards improving the developer experience when it comes to formatting.

However, I feel like what you said is not incompatible with this RFC, nor should it block it. The current state of the project is that we already enforce clang-format. We’ve been doing so for 1-2 years and the most recent incarnation of this enforcement is the LLVM-wide clang-format CI job that will run git-clang-format on the diff and suggest a patch you can apply locally to fix formatting differences, if any. Before that we had a Buildkite job that did the same for a long time, and the rest of LLVM has had their own clang-format enforcement via premerge testing as well.

This RFC is about doing a one-shot clang-format run of the whole libc++ codebase to greatly reduce the likelihood of pure formatting changes during the normal operation of the project. This will lead to a better developer experience overall since much less energy will have to be spent thinking about the formatting of unrelated lines of code (in particular during reviews).

I agree we should add an easy way of applying formatting locally. There’s a few challenges to that like matching the clang-format version with that of the enforcement job. I think improvements like those are not incompatible with this RFC, in fact both are just compatible steps in the direction of a better developer experience.

Iterative formatting is essentially the status quo. The problem with it is that the unformatted parts of the library are sometimes so different in their formatting from what we want to enforce that it ends up creating large unrelated hunks in patches. Such large changes make it difficult to author and review changes, which causes frustration and confusion around our formatting setup.

I agree – this is what this RFC is about, however while we’re touching so many lines in the library, we might as well clang-format it properly.

I agree with all of those as follow ups to this RFC. Those are basically pure improvements over the status quo and they will indeed improve the developer experience some more by making it easier to fulfill clang-format before even sending out a PR.