clang-format and BSD KNF

I’m trying to get clang-format to conform to a style which is roughly equivalent to the BSD Kernel Normal Form. In particular, I want a break after the function return type for function definitions, but not for function prototypes. For example:

  int foo(int i);

  int
  foo(int i)
  {
    …
  }

I’m having trouble distinguishing between function prototypes and definitions. It looks like Line.StartsDefinition is the correct key to distinguish between the two cases, however where that annotation ends up is different depending on the value of BreakBeforeBraces. When BreakBeforeBraces = Attach, the Line in question is the one starting with the return type. This works perfectly for inserting the break in TokenAnnotator::mustBreakBefore. When BreakBeforeBraces = Stroustrup, the StartsDefinition annotation gets attached to a Line starting with the first curly brace, not the line starting with the return type. In this case, I can’t find any reliable way to distinguish between a function prototype and a function definition in mustBreakBefore.

Any hints on the right way to approach this? I have to admit that I’m very confused about the high level architecture of the Format library, so it’s entirely possible I’m missing something really obvious.

Thanks in advance,
David

For example:

int foo(int i);

int
foo(int i)
{

}

I’m having trouble distinguishing between function prototypes and definitions.

I doubt that it is possible to distinguish between these cases. clang-format works on a stream of tokens, not the AST. I do not believe that there is any reliable way to distinguish between function prototypes and definitions without at least a partial AST.

> For example:
>
> int foo(int i);
>
> int
> foo(int i)
> {
> …
> }
>
> I’m having trouble distinguishing between function prototypes and
definitions.

I doubt that it is possible to distinguish between these cases.
clang-format works on a stream of tokens, not the AST. I do not believe
that there is any reliable way to distinguish between function prototypes
and definitions without at least a partial AST.

Why not? One ends in a semicolon, the other in an open curly brace.
clang-format has to make basically all of its decisions this way..

David

For example:

int foo(int i);

int
foo(int i)
{

}

I’m having trouble distinguishing between function prototypes and definitions.

I doubt that it is possible to distinguish between these cases. clang-format works on a stream of tokens, not the AST. I do not believe that there is any reliable way to distinguish between function prototypes and definitions without at least a partial AST.

Why not? One ends in a semicolon, the other in an open curly brace. clang-format has to make basically all of its decisions this way…

In the general case this would require infinite lookahead. A function can have n parameters, each with a set of attributes. I don’t know if lookahead is supported in clang-format, so I won’t comment on that.

The one case that I don’t think is possible to do without some AST based information is to disambiguate variations of the most vexing parse.

class C{};
C c;
int func1(C);
C func2(c);

Without knowing that C is a type and c is a variable you cannot decide that func1 is a function declaration and func2 is a variable definition.

> For example:
>
> int foo(int i);
>
> int
> foo(int i)
> {
> …
> }
>
> I’m having trouble distinguishing between function prototypes and
definitions.

I doubt that it is possible to distinguish between these cases.
clang-format works on a stream of tokens, not the AST. I do not believe
that there is any reliable way to distinguish between function prototypes
and definitions without at least a partial AST.

Why not? One ends in a semicolon, the other in an open curly brace.
clang-format has to make basically all of its decisions this way..

In the general case this would require infinite lookahead. A function can
have n parameters, each with a set of attributes. I don't know if
lookahead is supported in clang-format, so I won't comment on that.

The nice thing about clang-format is that it doesn't need to get every
obscure corner case right, and in practice it doesn't do so (there are
parts of the C++ grammar where this isn't possible). But again, in
practice, code where it's not "obvious" which grammar production is being
used from local context is extremely rare, probably because it's hard for
humans to reason about.

This rule seems pretty easy for clang-format to get right in the vast
majority of cases.

The one case that I don't think is possible to do without some AST based