LLD vs LLVM coding style...

Greetings folks,

We’re looking more at doing some serious hacking on LLD, and I’d like to avoid doing lots of work in the codebase only to change the style around later.

My understanding was that LLD was always intended to be a fully integrated LLVM project much like Clang, with a shared coding standard to go with the shared support libraries. Can we start that migration? I’m really opposed to having folks learn two styles of development, only to turn around and unlearn one shortly thereafter.

To be clear, I’m not asking for help. =] Happy to do all the legwork, and if it really shouldn’t be done eagerly, happy to do it as folks touch a file or interface. I’d just like to start moving things over.

-Chandler

Before suggesting a massive and vague changes across someone else's code base, perhaps you should start with specific issues you see. And why you think one way is better than another way.

The LLVM family of projects have a range of coding conventions. libc++, lldb, and LLVM core have different styles. That is a good thing. Coding conventions should not be ruled with an iron fist. There needs to be some room for innovation and experimentation so that the conventions can evolve and improve.

-Nick

I think a migrator is a short term, backwards looking fix. I think a better approach is a forward looking enhancement to clang that would be a way to describe the project's conventions to the compiler. It seems like every company/project has different conventions. By enabling projects to state their conventions, the compiler can warn when a convention is not being followed. This will make submitting conforming patches way easier, so reviewers can concentrate on algorithms and future proofing rather than "whitespace".

-Nick

> We're looking more at doing some serious hacking on LLD, and I'd like to
avoid doing lots of work in the codebase only to change the style around
later.
>
> My understanding was that LLD was always intended to be a fully
integrated LLVM project much like Clang, with a shared coding standard to
go with the shared support libraries. Can we start that migration? I'm
really opposed to having folks learn two styles of development, only to
turn around and unlearn one shortly thereafter.

Before suggesting a massive and vague changes across someone else's code
base, perhaps you should start with specific issues you see. And why you
think one way is better than another way.

Sorry, I wasn't trying to suggest anything vague, but rather refer to my
previous (perhaps ill founded) understanding about the expected path
forward for LLD. Anyways, I'll explain in a bit more detail so we can talk
about the concrete issue.

My concrete hope is that LLD migrates toward the coding standards that are
shared by both the core LLVM libraries and Clang. The specifics are
documented here: http://llvm.org/docs/CodingStandards.html

Some specific items that jump out at me as areas of noticeable divergence:
- Constructors' initializer lists
- Naming patterns (for locals, members, functions, arguments, etc...)

My more general desire is for LLD's codebase to strive for consistency with
LLVM's and Clang's. This, of course, is a two-way street. Several
conventions have started with the Clang codebase and migrated to be more
widely used within LLVM.

The LLVM family of projects have a range of coding conventions. libc++,
lldb, and LLVM core have different styles. That is a good thing. Coding
conventions should not be ruled with an iron fist. There needs to be some
room for innovation and experimentation so that the conventions can evolve
and improve.

First, libc++ is quite special. It shares no code with LLVM's core
libraries, and has many conventions which out of necessity diverge in order
to both co-exist cleanly with the C++ standard's specified conventions, and
the necessary protection against colliding tokens with preprocessor macros.

LLVM and Clang have extremely similar styles, and I think that is a good
thing. I think LLDB would benefit also from being consistent. Now, my
argument is not that of an iron fist, or that the LLVM style is "better" in
some abstract sense. Instead my argument is driven almost exclusively by
*consistency*.

It is a simple reality that to work with either LLD or Clang, a programmer
must work extensively with the core LLVM libraries. As a consequence,
having divergent styles between them creates a significant burden on
developers moving back and forth across that boundary. For new developers,
they must learn two different styles, and train themselves to both read and
write code proficiently in both styles. For existing developers who have
worked on LLVM in the past, it is a barrier to contributing to LLD which
seems bad for building and growing the open source community.

So in essence, my primary motivation is to have less divergence between two
codebases that are deeply connected and to make it easier to grow the LLD
community. In fact, I'm trying to bring myself and other LLVM developers
into the LLD community, and this is one issue that regularly slows down
that process.

However, there is a second reason to pick the specific LLVM coding
convention. We are building really awesome tools[1], which as you suggest
in your email to Sean, allow the tools to ensure the code follows a
particular convention, and the programmers to focus on algorithms, design,
and other more important matters. These tools are being developed initially
to support the existing conventions in LLVM and Clang, and it would be a
large (and of questionable utility) effort to add support for another
convention as well. In essence, the desire for tools *also* advocates for
the projects being consistent.

[1] http://clang.llvm.org/docs/ClangTools.html#clang-format

But none of this argues that the LLVM style is The Right Style. If there
are ways to improve the coding conventions with LLVM and Clang, we should
absolutely do that. The projects remain small enough that if you can show a
convention which is superior, we can easily adopt it for new code going
forward, and as tools such as clang-format (and its future brethren) become
more common we can even swiftly adopt new conventions across all of the
code.

Nick Kledzik <kledzik@apple.com> writes:

I think a migrator is a short term, backwards looking fix. I think a
better approach is a forward looking enhancement to clang that would
be a way to describe the project's conventions to the compiler. It
seems like every company/project has different conventions. By
enabling projects to state their conventions, the compiler can warn
when a convention is not being followed. This will make submitting
conforming patches way easier, so reviewers can concentrate on
algorithms and future proofing rather than "whitespace".

++++++1

                               -David

+1 to the general idea. I don’t hack on LLD, though, so not sure how much weight my opinion has here.

-Jim

The golden rule at the top of that coding standard specifically says to not make sweeping style changes like you suggest.

I don’t know what you mean by this. There is nothing in LLVM coding standards about constructors (other than don’t have static constructors).

lld has the same naming for types, functions, and methods as LLVM. It is just the naming of variables that has been improved.

All other things being equal, sure, consistency is nice. But lld is already very consistent with LLVM. The only places where it diverges is to make the code better: the use of C++11, and the sane naming of variables.

Also, there are many scopes of consistency. The LLVM convention for variables names is inconsistent with most C++ code in existence.

There may be a company culture difference here. At Apple our source is organized into hundreds of independently built “projects”. Each project team determines their own coding style because they are the ones that have to live in their source base. Over time there is a cross fertilization of coding styles as engineers move between teams. My understanding is the Google, because of the centralized build infrastructure is much more rigid about coding style.

I’ve had many discussions with lld contributors and 99% of the time it is about coming to understand the atom model and just where code should go to achieve some result. Coding style is not an issue.

Ok, you’ve poked me enough. I’ll be the one to say “the emperor has no clothes”. The LLVM convention for naming variables is poor. You’d be hard pressed to find any other C++ coding conventions that start variables with a uppercase letter. When the lld project started, I wrote up this attachment to describe why lld was using a better variable naming convention.

coding_conventions.html (6.47 KB)

I’ve found it difficult to deal with this since I don’t think any of the automated formatting tools can intelligently, nicely format this for you. It isn’t sufficient to simply uppercase the first letter automatically, since it probably isn’t what you want. For example if you have a lower case variable name that is an acronym for something (which I’ve found to be very common in LLVM related code, e.g. TM, for TargetMachine), a tool would have to be smart enough to know that it is an acronym in context and then fully uppercase it instead of just the first letter.

Sorry, I wasn't trying to suggest anything vague, but rather refer to my
previous (perhaps ill founded) understanding about the expected path
forward for LLD. Anyways, I'll explain in a bit more detail so we can talk
about the concrete issue.

My concrete hope is that LLD migrates toward the coding standards that are
shared by both the core LLVM libraries and Clang. The specifics are
documented here: http://llvm.org/docs/CodingStandards.html

The golden rule at the top of that coding standard specifically says to
not make sweeping style changes like you suggest.

Currently, LLD is relatively small, and so a sweeping style change might
not be unreasonable. Or it might be unreasonable. In my first email, I
acknowledge that if preferred we could do this on a per-file bases as
someone comes to the file to work with it.

I'm much more interested in establishing what the direction should be than
in worrying with the exact mechanism for getting ther.

Some specific items that jump out at me as areas of noticeable divergence:
- Constructors' initializer lists

I don't know what you mean by this. There is nothing in LLVM coding
standards about constructors (other than don't have static constructors).

Yes, but there is a desire for consistency and uniformity throughout the
code. There are 13 constructors with initializer lists formatted using the
leading-comma style LLD uses heavily:

Foo::Foo(...)
  : member(...)
  , member2(...)
  , member3(...) {
}

That is in a very small minority. I'd like to not have different formats
for such things as it waste's developers time writing them. clang-format
does a very nice job already of formatting initializer lists, and it is
reasonable consistent with LLVM and Clang's codebases, so personally I
would just use it's format. I don't really care what the format is, as long
as it is consistent across LLVM and LLD. Deviations here just add noise to
patch review and code reading, especially as code migrates back and forth
between common libraries and LLD specific code.

All other things being equal, sure, consistency is nice. But lld is
already very consistent with LLVM. The only places where it diverges is to
make the code better: the use of C++11, and the sane naming of variables.

When I looked at a few header files, I found inconsistencies in them
quickly:

- Resolver.h: inconsistently aligned variable names in declaration
sequences (lines 180-184, 188-198)
- File.h: The naming of enumerations of the kinds of things.
(lld::File::Kind)
- InputFiles.h: indentation and comma placement for parameters. (lines
54-58)

But I don't really care how many or where. The point is whether consistency
across LLVM and LLD is the desired end state. If so, then when folks are
touching these files they can reformat into a consistent state. If not,
then folks have to learn to read, write, review, and maintain two divergent
styles of code. The latter is what I would like to avoid.

There may be a company culture difference here. At Apple our source is

organized into hundreds of independently built "projects". Each project
team determines their own coding style because they are the ones that have
to live in their source base. Over time there is a cross fertilization of
coding styles as engineers move between teams. My understanding is the
Google, because of the centralized build infrastructure is much more rigid
about coding style.

I care very little about Google's coding style. I care much more about the
open source community, and the collection of projects under the LLVM
umbrella being readily accessible to developers. I care about developers
hacking on Clang, LLVM, and LLD interchangeably and with a minimum learning
curve when hopping projects. I care about ensuring that common data
structures and abstractions developed in Clang or LLD (or other projects)
being consolidated and shared in the LLVM core libraries without increased
inconsistency.

I've had many discussions with lld contributors and 99% of the time it is
about coming to understand the atom model and just where code should go to
achieve some result. Coding style is not an issue.

Well, our experiences here differ. I'm talking mostly to potential lld
contributors, and they often are turned off by coding style. I'm not
claiming they're right to be turned off by something small like this, but
as an open source project it seems like the goal should be to remove any
and all barriers to contributions possible.

But none of this argues that the LLVM style is The Right Style. If there
are ways to improve the coding conventions with LLVM and Clang, we should
absolutely do that. The projects remain small enough that if you can show a
convention which is superior, we can easily adopt it for new code going
forward, and as tools such as clang-format (and its future brethren) become
more common we can even swiftly adopt new conventions across all of the
code.

Ok, you've poked me enough. I'll be the one to say "the emperor has no
clothes". The LLVM convention for naming variables is poor. You'd be
hard pressed to find any other C++ coding conventions that start variables
with a uppercase letter. When the lld project started, I wrote up this
attachment to describe why lld was using a better variable naming
convention.

You may or may not know this, but I actually argued against the LLVM
convention for naming variables when Chris was formalizing some parts of
the coding standards. Over time, I have ceased to really care, and I think
Chris was entirely correct that ultimately the naming convention for
variables shouldn't matter. If they are well named, everything will be
clear.

My desire is for lld to be a little corner where we try out this new

convention.

A different naming convention is one of the most disruptive things to have
change between two code bases. It will cause people to habitually write
code one way, realize they are submitting to the other repository, and have
to mechanically go and rename all their variables. While I personally like
your naming convention slightly better than LLVM's, and a few other
conventions slightly better than yours, I really don't care.

I think we (and by we, I mean Chris in his role as BDFL of the LLVM coding
standards) should pick a naming convention, and stick to it. If Chris wants
to switch LLVM's guidelines for new code going forward to match yours, I'm
fine with that. If he wants to make up a third convention, I'm fine with
it. As long as I can train my fingers to write in one pattern when working
with the various LLVM projects, and forget about everything else I'm happy.

-Chandler

FWIW, I think this is like switching between spoken languages. At the beginning you can get things mixed up, but once you gain proficiency, you can use either without difficulty.

-Krzysztof

Oh jeez, typical … Accidentally sent this only to Krzysztof instead of everyone; my apologies for duplicate.