clang-format and "ASCII art" formatting

Hi all,

although I think we can make clang-format (clang/lib/Format) format more and more pieces of C++ according to a specific style, I am convinced that there are cases where automatic formatting is not the right solution.

An example is:
http://clang.llvm.org/doxygen/ParseExpr_8cpp-source.html line 60-90. Here, careful human thought leads to much more readable code then simply following a style guide.

In order to handle such cases and still be able to auto-format entire files, I propose to add certain markers around areas of a file that clang-format should not touch. Does anyone have thoughts on this? How would we best design this? Add special comments like “// NO clang-format {”?

Cheers,
Daniel

Hi all,although I think we can make clang-format (clang/lib/Format) format more and more pieces of C++ according to a specific style, I am convinced that there are cases where automatic formatting is not the right solution.
An example is:
http://clang.llvm.org/doxygen/ParseExpr_8cpp-source.htmlšline 60-90. Here, careful human thought leads to much more readable code then simply following a style guide.

IMO, manual column formatting should be avoided in most cases. For example, code fragment you are referring to would be even more readable (and conforming to coding style) if used line breaks befor "return" statements.

Hi all,

although I think we can make clang-format (clang/lib/Format) format more and
more pieces of C++ according to a specific style, I am convinced that there
are cases where automatic formatting is not the right solution.

An example is:
http://clang.llvm.org/doxygen/ParseExpr_8cpp-source.html line 60-90. Here,
careful human thought leads to much more readable code then simply following
a style guide.

In order to handle such cases and still be able to auto-format entire files,
I propose to add certain markers around areas of a file that clang-format
should not touch.

In my opinion, the main advantage of an autoformatter is that its
output can be declared as canonical and that in organizations that do
that nobody needs to argue about indentation anymore. If you provide a
formatting escape hatch, you lose that property.

Nico

> Hi all,
>
> although I think we can make clang-format (clang/lib/Format) format more
and
> more pieces of C++ according to a specific style, I am convinced that
there
> are cases where automatic formatting is not the right solution.
>
> An example is:
> http://clang.llvm.org/doxygen/ParseExpr_8cpp-source.html line 60-90.
Here,
> careful human thought leads to much more readable code then simply
following
> a style guide.
>
> In order to handle such cases and still be able to auto-format entire
files,
> I propose to add certain markers around areas of a file that clang-format
> should not touch.

In my opinion, the main advantage of an autoformatter is that its
output can be declared as canonical and that in organizations that do
that nobody needs to argue about indentation anymore. If you provide a
formatting escape hatch, you lose that property.

This feature would not benefit organizations as much as it might benefit
small open source projects.

Cheers,
/Manuel

First of all, I personally agree and I would not use this feature.
Obviously, if companies / projects decide not to it, they don't have
to (we can build style / runtime / compile options into clang-format
to forbid it).

I guess, I am actually asking more specifically about the LLVM project
itself and whether it would benefit from such an option. I want to
prevent a situation where an LLVM/Clang developer would like to use
clang-format, "but it always messes up this one part so badly".

First of all, I personally agree and I would not use this feature.
Obviously, if companies / projects decide not to it, they don't have
to (we can build style / runtime / compile options into clang-format
to forbid it).

I guess, I am actually asking more specifically about the LLVM project
itself and whether it would benefit from such an option. I want to
prevent a situation where an LLVM/Clang developer would like to use
clang-format, "but it always messes up this one part so badly".

Wouldn't that just be a bug in the formatter though? Do you think
there are many cases it just won't be able to get right? The code you
linked to looks like something an automated formatter should be able
to produce.

Well in this case yes, but it is only one of many. And even in this
case the question is, how does a formatter know when to not use the
usual style of break-before-return or even align all of the return
statements to the same column.

An alternative argument might be that the benefit of automated
formatting just outweighs the cost of a few cases that could be made
"more readable". But these are decisions that have to be made for
projects and I wanted to get input. So far, we have 3 votes against it
and I am happy not to implement :-).

I think an auto-formatter that understands the language and is sufficiently smart could probably avoid many situations where overriding the dumb formatting in standard code editors is required. It could take into account more factors than just 'this is a switch statement.' It could consider how many case labels, how long they are, how long case bodies are, how much fall-through is used, etc.

And though there may be some cases where people still want to deviate I think in those situations other mechanisms for formatting parts of files rather than whole files would be better. For example I think being able to tie into revision control so a user can instruct the tool to just format the lines he's touching is valuable.

I'm not convinced that there are that many situations where this kind
of special-case formatting is ever completely necessary. Let's try to
avoid as long as possible adding things to clang-format that will
allow subjectivity in source layout :wink: Let's keep things completely
automated.

Maybe it's worth teaching clang-format about this situation as a
special case? Alternatively, as Konstantin points out, putting the
returns on their own line indented would lead to an equally (ore more)
readable layout, and I think it would be perfectly acceptable for
clang-format to default to doing something like that.

Also, there are a lot of coding styles that do this kind of alignment
for nearly all declarations (e.g. for all the `=` when declaring local
variables), so by the time that clang-format is in production across a
variety of projects, I imagine that support for "aligned" layouts like
these switches will be a piece of toggleable generic functionality
(like maybe through an astmatcher or something).

A slightly more thorny case I think is distinguishing a situation
where we just have a bag of labels like

case Foo:
case Bar:
case Baz:
  return Qux;

from a situation where the case labels have some natural grouping
which leads naturally to a more condensed layout like

case EAX: case ECX: case EDX: case EBX:
case EBP: case ESP: case ESI: case EDI:
  return GPR;

An even nastier example is
<http://clang.llvm.org/doxygen/Lexer_8cpp_source.html#l02851>, where a
condensed layout is used with *comments* put in place of "missing"
labels which are handled elsewhere to maintain the logical
alphabetical alignment and exhibit that those other cases are being
handled elsewhere. Even with the benefit of an AST, supporting this
kind of thing seems extraordinarily difficult ("AI-complete"?).
However, this particular situation is *extremely* compelling, because
pretty much any other layout would obscure the fact that those
specific other letters are handled elsewhere.

The lexer example convinces me that we *will* need some escape hatch
for the formatter; I guess the question then is "does now feel like
the right time?". I think a reasonable criterion for this is "when it
is needed, implement it". I.e. when you start reformatting real code
and the result is unacceptable due some situation that can't be
reasonably addressed in an automated fashion (like the lexer situation
above).

Btw, w.r.t. bikeshedding over what the escape hatch should look like,
you can circumvent the entire issue and just make it configurable
(like passing in a regex or something). I think it is safe to have the
default inside the library be no escape hatch unless one is explicitly
provided. The clang-format commandline program can then keep a table
of common preexisting ones and use a command line switch to choose
between them (and just pick one as the default for projects "natively"
using clang-format on the command line). Remember
<http://xkcd.com/927/>.

-- Sean Silva

Derp, sorry for the novel of my thought process on the issue.

tl;dr: Yes, we definitely will need it when working with any
nontrivial real codebase. Implement it when it is blocking forward
progress. Make the delimiter for "manually formatted" regions
configurable (like a regex) and have the command line tool support
various existing styles through a command line flag and not introduce
yet another "don't format this" marker.

Remember that a large contributor to clang's success is the fact that
it is sufficiently close to GCC to be able to pretty much immediately
be useful for projects that use GCC. Probably without going to the
extent of command-line compatibility with existing formatting tools, I
think that having clang-format be useful for existing codebases
without requiring the changes to their code (i.e. changing the "don't
format this" comments to some marker specific to clang-format) would
be a huge win.

-- Sean Silva

Derp, sorry for the novel of my thought process on the issue.

tl;dr: Yes, we definitely will need it when working with any
nontrivial real codebase. Implement it when it is blocking forward
progress. Make the delimiter for “manually formatted” regions
configurable (like a regex) and have the command line tool support
various existing styles through a command line flag and not introduce
yet another “don’t format this” marker.

Remember that a large contributor to clang’s success is the fact that
it is sufficiently close to GCC to be able to pretty much immediately
be useful for projects that use GCC. Probably without going to the
extent of command-line compatibility with existing formatting tools, I
think that having clang-format be useful for existing codebases
without requiring the changes to their code (i.e. changing the “don’t
format this” comments to some marker specific to clang-format) would
be a huge win.

– Sean Silva

I’m not convinced that there are that many situations where this kind
of special-case formatting is ever completely necessary. Let’s try to
avoid as long as possible adding things to clang-format that will
allow subjectivity in source layout :wink: Let’s keep things completely
automated.

Maybe it’s worth teaching clang-format about this situation as a
special case? Alternatively, as Konstantin points out, putting the
returns on their own line indented would lead to an equally (ore more)
readable layout, and I think it would be perfectly acceptable for
clang-format to default to doing something like that.

Also, there are a lot of coding styles that do this kind of alignment
for nearly all declarations (e.g. for all the = when declaring local
variables), so by the time that clang-format is in production across a
variety of projects, I imagine that support for “aligned” layouts like
these switches will be a piece of toggleable generic functionality
(like maybe through an astmatcher or something).

A slightly more thorny case I think is distinguishing a situation
where we just have a bag of labels like

case Foo:
case Bar:
case Baz:
return Qux;

from a situation where the case labels have some natural grouping
which leads naturally to a more condensed layout like

case EAX: case ECX: case EDX: case EBX:
case EBP: case ESP: case ESI: case EDI:
return GPR;

An even nastier example is
<http://clang.llvm.org/doxygen/Lexer_8cpp_source.html#l02851>, where a
condensed layout is used with comments put in place of “missing”
labels which are handled elsewhere to maintain the logical
alphabetical alignment and exhibit that those other cases are being
handled elsewhere. Even with the benefit of an AST, supporting this
kind of thing seems extraordinarily difficult (“AI-complete”?).
However, this particular situation is extremely compelling, because
pretty much any other layout would obscure the fact that those
specific other letters are handled elsewhere.

Regarding this specific example: I think it could be “simple” to just treat the whole line as “unbreakable” whenever there are several “case” on it.

That is, if two ‘case’ are on the same physical line, just treat this whole line as a black box, and contend yourself with aligning it first non-blank character.

– Matthieu

Regarding this specific example: I think it could be "simple" to just treat
the whole line as "unbreakable" whenever there are several "case" on it.

That is, if two 'case' are on the same physical line, just treat this whole
line as a black box, and contend yourself with aligning it first non-blank
character.

That seems like it would work well and be a good general policy. There
are harder examples though like
<http://clang.llvm.org/doxygen/Lexer_8cpp_source.html#l01013> where
the source code is actually aligned with contents of nearby comments.

-- Sean Silva

Recognizing and leaving alone "tabular" array initializers would generally be a good thing. These are very common.

Sebastian

I think it's interesting to look at the two different use cases for
clang-format:
1. while writing my code, I basically type one big line (or break lines
arbitrarily) and then hit clang-format to fix it all up for me. In that
case I don't want clang-format to exclude certain constructs by itself
2. when running over a source file before checking it in

I think if we can have 1 and 2 lead to consistent layouts, that would be a
plus.

Cheers,
/Manuel