Code formatting alignment with LLVM

Hi all,

We have been having a discussion on the GitHub issue tracker about code formatting (and specifically clang-format settings) and whether to align closer with the rest of the project, which you can find here: https://github.com/flang-compiler/f18/pull/945. Since the discussion there hasn’t moved much recently I’d like to start a discussion here so we can get input from a wider group of people.

My opinion is that regardless of technical preferences we shouldn’t diverge from the style of the rest of the project as much as we currently do, or at least if we want to do that then we should have a discussion with the wider community about whether that is acceptable to them.

Does anyone else have any input on this?

Thanks
David Truby

I’m always in favor of minimizing the differences. It helps getting “accepted” by the community but also me personally. I’m used to llvm settings, anything that differs looks weird and will make my workflow uneasy. I think a lot is just conditioning, when I read diffs, for example, the color highlighting is usually good enough to show the non-alignment change even if things moved a bit. I don’t notice that as a problem by now, which is not to say I don’t understand the arguments made for the current f18 settings.

Btw. I failed to find the clang format “clang header exception” mentioned in the call yesterday. Can someone show me a minimal example to show the effect? I mean, what does clang do differently for which we need an extra option? It sounds like something that needs fixing and we might be able to sign with llvm without changing the current options.

Cheers,

Johannes

P.s. I strongly advice to have discussions on the list or on phabricator, whichever is applicable. We already know various ppl do not monitor/interact with GH.

Hi Johannes,

The header exception we are referring to is the IncludeCategories: section in the clang-format file. The default LLVM style has (llvm|llvm-c|clang|clang-c) as prefixes for headers to ensure those headers appear first when includes are sorted. This follows the Coding Style which says that LLVM sub-project includes should be sorted from most specific to least specific.

As a small example: if we don’t add this specific excpetion and I write
#include <flang/parser/parse-tree.h>
#include <clang/Driver/Driver.h>
this will be re-sorted to

#include <clang/Driver/Driver.h>
#include <flang/parser/parse-tree.h>
as clang is higher in the priority order than flang without an additional exception.

Given that the order should be different for each sub-project (as their own headers should be first) I think this one addition to the clang-format file does make sense, rather than having it in the global style.

I hope that explanation helps!

Thanks
David Truby

It does, thank you.

Can we add flang to the regexp and thereby avoid the actual
configuration in the .clang-format? I mean, flang is an llvm subproject
as clang is, why treat it differently?

Hi Eric,

I believe there will be in the Driver, as that will be sharing some of clang’s driver code. Perhaps someone else can give a better answer on this than me though?

David

HI Johannes, sorry for the delayed reply.

The regex that is already there sort of “accidentally” works for clang as c is higher up in the alphabet than any other LLVM subproject.
The coding guidelines say that the sorting of header includes should be in the order of most specific to least specific LLVM subproject.
This is very likely going to be different for each individual subproject, so I think this is one case where divergence from the main LLVM style is easy to argue aligns us better with the actual intended coding guidelines.
For example, in our case we want the order flang → fir → MLIR → clang → LLVM, but this isn’t the order that e.g. libc++ would want. I don’t think this difference is captureable in the global style.

Thanks
David Truby

Hi David,

Thanks for the reply.

There was some discussion about refactoring some of the driver code into “common” bits and C/C++ “specific” bits. I don’t know which direction that went off in. Maybe someone else knows.

But it could also be true that such a refactoring happens entirely within the existing clang directory structure. It’s not clear if that wouldn’t be a bit confusing in a compassionate coding sense, more so than finding a “generic shared driver parts” home.

Hi all,

Tim Kieth from Nvidia has provided a specific example from the recent FIR pull request where the indentation rules caused spurious diffs (see this comment: https://github.com/flang-compiler/f18/pull/945#issuecomment-585856610).

I’m curious if this is also something that shows up badly in Phabricator or if that has a better way of showing diffs involving whitespace. Is anyone familiar enough with Phabricator to comment on this? I wonder if Phabricator might just be better at handling code reviews of this ilk better, meaning that this problem is less visible to clang/llvm developers.

Thanks
David Truby

Hi David,

Thanks for the reply.

There was some discussion about refactoring some of the driver code
into “common” bits and C/C++ “specific” bits. I don’t know which
direction that went off in. Maybe someone else knows.

But it could also be true that such a refactoring happens entirely
within the existing clang directory structure. It’s not clear if that
wouldn’t be a bit confusing in a compassionate coding sense, more so
than finding a “generic shared driver parts” home.

The argument was made that we should split generic driver code off and
put it into `llvm/lib/Frontend/Driver`.

Hi all,

Tim Kieth from Nvidia has provided a specific example from the recent
FIR pull request where the indentation rules caused spurious diffs
(see this comment:
https://github.com/flang-compiler/f18/pull/945#issuecomment-585856610).

I'm curious if this is also something that shows up badly in
Phabricator or if that has a better way of showing diffs involving
whitespace. Is anyone familiar enough with Phabricator to comment on
this? I wonder if Phabricator might just be better at handling code
reviews of this ilk better, meaning that this problem is less visible
to clang/llvm developers.

Here is the diff:
https://reviews.llvm.org/differential/diff/244525/

What line numbers are we looking at again?

Hi all,

Since we haven’t seen any movement on this in quite a while and our projected merge date is moving ever closer, I wonder if I could summarise the positions that have been given so far so we can decide on a way forward. If I misrepresent what you’ve stated at all please reply to correct me, I’m just trying to give a brief summary.

I opened the discussion with a PR to see what the projected changes would look like; I did this for two reasons, firstly that we were requested to in a discussion about merging on llvm-dev, but also because I believe we should align with the rest of the community on a single coding style. My position is that I have no preference at all in what the style is, I just think there should be a single one across the community. My reason for thinking this is that fairly likely that changes to f18, mlir and llvm will be mixed in together once we land, and working on a single change that has multiple different coding styles within it may lead to some issues for anyone working on those.

Tim Keith and Peter Klausler from Nvidia replied to say that comment alignment (and code alignment in general) cause spurious changes to a appear when submitting diffs, making it more difficult to see what has changed. Tim Keith also pointed out that clang and llvm are not properly clang-formatted entirely themselves, which weakens the “common code style across the board” desire.

Eric Schweitz replied to show MLIR’s clang-format file, which is also the one used for the lib/Lower subdirectory of f18. This style has only one small change from the LLVM style, but it is nonetheless a diversion. Eric also pointed out that clang-formatting for new code is enforced on Phabricator by pre-commit bots. This is the case not only for mlir, but for new code heading to clang and llvm too. This is consistent with the Coding Guidelines which do not require that all code is properly clang-formatted, only all new code. Eric also suggested that we could submit diffs without running clang-format, so that the changes are clear, and then once approved run clang-format to match the code with the rest of the project. This might mitigate the code alignment issues.

Johannes Doerfert stated his preference for minimizing clang-format differences from the rest of the project, in part to ease our acceptance by the community. He also stated that he is used to the llvm settings and so anything different makes his workflow uneasy. I believe this goes hand in hand with my statement about working with different coding styles in a single change.

On the F18 community call, Hal Finkel explained the reasoning behind the LLVM coding style decisions. He stated that LLVM prefers readability of code over writability (or reviewability) as code is much more likely to be read than written or reviewed. This is why LLVM uses code and comment alignment in its coding style. On the call it was also discussed that some people do not find that the alignment makes code easier to read and that this seems to be subjective.

I think that this is the point the discussions have reached so far, if I have missed anything or misrepresented anyone’s position please reply to this to clarify.

Given the impasse we have got to, I suggest that if we cannot find a compromise here we should take the discussion to llvm-dev and see how the wider community feels about a large diversion from the coding style for a specific sub-project. It may be that this is not a hard block for merging, however it is something that was requested of us initially so it is something we need to discuss there if we decide not to do it.

I hope this helps to move the conversation forward.

Thanks
David Truby

Hi all,

Since we haven’t seen any movement on this in quite a while and our projected merge date is moving ever closer, I wonder if I could summarise the positions that have been given so far so we can decide on a way forward. If I misrepresent what you’ve stated at all please reply to correct me, I’m just trying to give a brief summary.

I opened the discussion with a PR to see what the projected changes would look like; I did this for two reasons, firstly that we were requested to in a discussion about merging on llvm-dev, but also because I believe we should align with the rest of the community on a single coding style. My position is that I have no preference at all in what the style is, I just think there should be a single one across the community. My reason for thinking this is that fairly likely that changes to f18, mlir and llvm will be mixed in together once we land, and working on a single change that has multiple different coding styles within it may lead to some issues for anyone working on those.

Tim Keith and Peter Klausler from Nvidia replied to say that comment alignment (and code alignment in general) cause spurious changes to a appear when submitting diffs, making it more difficult to see what has changed.

Can you clarify the concern here? If there is a one-time formatting of the codebase now (LLDB did such a one-time formatting change to align with LLVM a few years ago), then any future patches correctly formatted should not have any spurious change right?

Tim Keith also pointed out that clang and llvm are not properly clang-formatted entirely themselves, which weakens the “common code style across the board” desire.

It is worth noting that a lot of the codebase predates clang-format, when there wasn’t any tool to help formatting we could only rely on code review and best effort.

Hi Mehdi,

Thanks for the reply. My understanding of the concern is that if you change a line that has trailing comments in a block for example, and the length of the line changes, all the comments on other lines will be realigned, and the diff will show every line as having changed even though only one has. Does that make sense?

Thanks for the info about clang-format, that helps explain why the whole codebase isn’t run through it already.

Thanks
David Truby

Thank you for the summary. Some more things to consider:

Other than the clang-format file, there are other code style
differences not covered by clang-format. For instance, the naming
scheme for local variables. LLVM uses PascalCase, but lld uses
camelCase (although there were discussion to change this for LLVM as
well).

Tim Keith and Peter Klausler from Nvidia replied to say that comment alignment (and code alignment in general) cause spurious changes to a appear when submitting diffs, making it more difficult to see what has changed. Tim Keith also pointed out that clang and llvm are not properly clang-formatted entirely themselves, which weakens the “common code style across the board” desire.

LLVM does not re-format code on a global scale. Large formatting-only
changes are not done (also see https://reviews.llvm.org/D50055).
Otherwise a change of clang-format would require re-formatting of the
entire code base.

Only changed lines in patches are re-formatted, e.g. with 'git
clang-format'. There should not be spurious changes in submitted
diffs.

As far as I see, this is the only argument against a vanilla
'BasedOnStyle: LLVM'?

Eric Schweitz replied to show MLIR’s clang-format file, which is also the one used for the lib/Lower subdirectory of f18. This style has only one small change from the LLVM style, but it is nonetheless a diversion. Eric also pointed out that clang-formatting for new code is enforced on Phabricator by pre-commit bots. This is the case not only for mlir, but for new code heading to clang and llvm too. This is consistent with the Coding Guidelines which do not require that all code is properly clang-formatted, only all new code. Eric also suggested that we could submit diffs without running clang-format, so that the changes are clear, and then once approved run clang-format to match the code with the rest of the project. This might mitigate the code alignment issues.

Is there a reason why MLIR added 'AlwaysBreakTemplateDeclarations:
Yes'? I'd prefer to not deviate from any LLVM standards/policy unless
there is a project-specific reason. If the reason is not
project-specific, it should be discussed for the entire LLVM project.
Not there is another current thread about deviating from the general
LLVM policy: https://lists.llvm.org/pipermail/llvm-dev/2020-February/139381.html

Michael

Trailing comments are rare in the LLVM code base. Usually comments are before the line, since not a lot of text fits into the 80 column limit after a code statement. clang-format indents all comments to the indention of the first comment, such as:

AStatmentLineUsingATypicalAmountOfCharacters(); // a comment
// trailing on this
// statement consumes
// multiple lines

My suggestion is to not use trailing comments. clang-format otherwise only re-formats comments if exceeding the 80 column limit.

Michael

Thank you for the summary. Some more things to consider:

Other than the clang-format file, there are other code style
differences not covered by clang-format. For instance, the naming
scheme for local variables. LLVM uses PascalCase, but lld uses
camelCase (although there were discussion to change this for LLVM as
well).

MLIR also use camelCase, mostly because we tried to anticipate where the wind was going on this and since lld was getting renamed at the time… It didn’t pan out in LLVM so far unfortunately.

<flang-dev@lists.llvm.org>:

Tim Keith and Peter Klausler from Nvidia replied to say that comment alignment (and code alignment in general) cause spurious changes to a appear when submitting diffs, making it more difficult to see what has changed. Tim Keith also pointed out that clang and llvm are not properly clang-formatted entirely themselves, which weakens the “common code style across the board” desire.

LLVM does not re-format code on a global scale. Large formatting-only
changes are not done (also see https://reviews.llvm.org/D50055).

In general this is correct, in practice lldb and lld both got a one-time large scale renaming/formatting I believe.

Otherwise a change of clang-format would require re-formatting of the
entire code base.

Only changed lines in patches are re-formatted, e.g. with ‘git
clang-format’. There should not be spurious changes in submitted
diffs.

As far as I see, this is the only argument against a vanilla
‘BasedOnStyle: LLVM’?

Eric Schweitz replied to show MLIR’s clang-format file, which is also the one used for the lib/Lower subdirectory of f18. This style has only one small change from the LLVM style, but it is nonetheless a diversion. Eric also pointed out that clang-formatting for new code is enforced on Phabricator by pre-commit bots. This is the case not only for mlir, but for new code heading to clang and llvm too. This is consistent with the Coding Guidelines which do not require that all code is properly clang-formatted, only all new code. Eric also suggested that we could submit diffs without running clang-format, so that the changes are clear, and then once approved run clang-format to match the code with the rest of the project. This might mitigate the code alignment issues.

Is there a reason why MLIR added ‘AlwaysBreakTemplateDeclarations:
Yes’? I’d prefer to not deviate from any LLVM standards/policy unless
there is a project-specific reason. If the reason is not
project-specific, it should be discussed for the entire LLVM project.

I don’t know, but I agree. This got merged unnoticed, otherwise I’d have removed it when merging MLIR into LLVM (or before). I just sent https://llvm.discourse.group/t/remove-mlir-custom-clang-format-configuration/647 FYI.

Thanks,

Hi Michael,

Thanks for the reply, I’ve added some comments inline below.

David

Thank you for the summary. Some more things to consider:

Other than the clang-format file, there are other code style
differences not covered by clang-format. For instance, the naming
scheme for local variables. LLVM uses PascalCase, but lld uses
camelCase (although there were discussion to change this for LLVM as
well).

We have discussed some of these differences too, for example no else after early return and no braces around single line if statements. My understanding is that these are harder to change automatically so we need to have a longer discussion about those which won’t happen in time for the merge.
In this specific case we currently use camelCase for local variables, for the same reason as lld and mlir.

Tim Keith and Peter Klausler from Nvidia replied to say that comment alignment (and code alignment in general) cause spurious changes to a appear when submitting diffs, making it more difficult to see what has changed. Tim Keith also pointed out that clang and llvm are not properly clang-formatted entirely themselves, which weakens the “common code style across the board” desire.

LLVM does not re-format code on a global scale. Large formatting-only
changes are not done (also see https://reviews.llvm.org/D50055).
Otherwise a change of clang-format would require re-formatting of the
entire code base.

Only changed lines in patches are re-formatted, e.g. with 'git
clang-format'. There should not be spurious changes in submitted
diffs.

As far as I see, this is the only argument against a vanilla
'BasedOnStyle: LLVM'?

I am hoping we can do a one-off global reformat, the same as lld and lldb did before merging, so that we match the LLVM style.