Stricter adherence to coding standards in LLVM?

I notice that there’s a lot of code in LLVM core libraries that is wildly inconsistent in formatting. Two particular things come to mind:

– Capitalization of method names. The LLVM guidelines say one thing, but a lot of recent code (such as IRBuilder) consistently does the opposite of what the guidelines say. One has to wonder which source of truth is more authoritative - after all, the docs could always be out of date.

– Whitespace around * and & in variable declarations. I see some files that consistently put the space after, some files put it before, while some files use one convention for parameters, and a different convention for member variables and local variables. Within a single file, the usage is pretty consistent, so you know that the people who wrote the code were really trying to adhere to a standard.

Now, common wisdom for open source projects is that inconsistent legacy formatting is left alone, and new code adheres to whatever style is used in nearby code. Otherwise it creates difficulty for people who want to compare revisions in the source control. Occasionally there is a need to diff revisions that are widely spaced in time, and a global reformatting makes this hard if the reformatting change falls somewhere within the timespan of interest. The downside of this approach is that you’re stuck with the inconsistent formatting forever.

In my own open source projects, I take the opposite approach, which is to nip any inconsistencies in the bud as early and aggressively as possible, so that they won’t cause problems later on. A person who needs to compare two revisions is far more likely to be interested in recent revisions than really old ones, so it’s better to do large changes sooner rather than later - the farther in the past we can push back the reformatting the better.

There is also the problem of backwards compatibility with existing users of the code. Obviously it’s bad form to break user code for trivial stylistic reasons. However, if there is some other API change with broad impact that’s going in, it might make sense to fix the names at the same time, given that folks are going to have to update their code anyway. (Although that’s unlikely with a codebase as mature as LLVM). Another approach is to deprecate the old API for a given class entirely, and replace it with some newer, shinier version that just happens to conform to the letter of the conventions, so that people can continue to use the old class until they get around to upgrading.

In the case of whitespace around * and &, that is semantically neutral, and the impact on the diffs are small, so a global fix to these would incur less pain overall. The main danger is that the discussion of proper style is likely to turn into a major religious war, making it not worth the effort to even raise the topic.

I notice that there's a lot of code in LLVM core libraries that is wildly inconsistent in formatting. Two particular things come to mind:

-- Capitalization of method names. The LLVM guidelines say one thing, but a lot of recent code (such as IRBuilder) consistently does the opposite of what the guidelines say. One has to wonder which source of truth is more authoritative - after all, the docs could always be out of date.

IRBuilder is recent, but the naming conventions are even more so.

-- Whitespace around * and & in variable declarations. I see some files that consistently put the space after, some files put it before, while some files use one convention for parameters, and a different convention for member variables and local variables. Within a single file, the usage is pretty consistent, so you know that the people who wrote the code were really trying to adhere to a standard.

Yes, this is annoying.

Now, common wisdom for open source projects is that inconsistent legacy formatting is left alone, and new code adheres to whatever style is used in nearby code.

...

nip any inconsistencies in the bud as early and aggressively as possible

..

There is also the problem of backwards compatibility with existing users of the code.

I'm not worried about API stability, this is something we don't guarantee.

The approach I'd like to see is gentle, gradual migration. As new code is written, it should follow the standards, as existing code is modified, it should be updated to match the standards. This is what I do when working on the codebase, and would appreciate it if others would as well. It's best to not conflate coding standards changes in with other semantic changes though.

Eventually, it would be great for someone to make a clang mode to enforce the standards for the project. If we had that, we could do a post-commit hook or something.

-Chris