Variable names rule

Hi all,

As application of the naming rules are currently under discussion [1] this seems like a good time to bring this up:

The current variable naming rule [2] states: Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats).

I’m a relatively new arrival to the LLVM codebase and I want to follow the rules wherever I can, but I humbly submit that this rule is suboptimal for readable code.

The rationale given at the time this rule was added was “document the prevailing convention” [3]. It was debated after the policy change whether this was the right choice [4].

The main problem I find with this rule is that it is the same as the type naming rule. Why is this a problem? It is famously hard to name things (“There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.”) and with this rule if you’ve already used a name for a type, you can’t reuse it for a variable of that type.

So what do you do instead? Often it seems the answer is to use an acronym (Target T), which hurts readability, or prepend “The” to the type name (Target TheTarget), which wastes space and also hurts readability because the start and end of a word are the most important for readnig. [5]

So we’ve got declarations like “LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC, &LVL, &CM)” [6] which is quite intimidating to newcomers.

If we could also use snake_case variable names then straight away you have an obvious, readable name (Target target). It looks like there’s plenty of code that does this already, and it’s consistent with our Python code. Also, it makes C idioms like “for (int i = 0; …” permissible.

I realize that there’s also a rule [2] “Avoid abbreviations unless they are well known” - I believe allowing a trivial way to get readable names would help greatly to adhere to this.

What do you think? Could we relax the variable naming rule?

[1] http://lists.llvm.org/pipermail/cfe-dev/2019-January/061085.html

[2] https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

[3] https://github.com/llvm/llvm-project/commit/30e697ebaf4fc8a9b8cf6abecbc75ef1b9fa0dc0

[4] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110808/126097.html

[5] Mostly based on my own experience, but somewhat supported by evidence: https://en.wikipedia.org/wiki/Transposed_letter_effect

[6] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L7433

Is moving to snake_case and the resulting inconsistency in the naming conventions worth the benefits? A wholesome change is out of the question.

Amara

With the move to GitHub we have a once in a lifetime opportunity to
rewrite history into a preferred, consistent form. And I'm only 90%
joking there.

Tim.

I completely agree with you that our variable naming rule is broken. This discussion has been brought up before (e.g. http://lists.llvm.org/pipermail/llvm-dev/2014-October/077685.html) and hasn’t made any progress - people seem to not be willing to make a change, e.g. saying "the cure is worse than the disease". I’m personally in favor of Nick’s proposal (linked above) which makes variable names and function names be lower camel case.

The Swift language world (a developer community several orders of magnitude larger than LLVM’s) uses this convention very successfully and people seem very happy with it. Swift also has a very carefully considered API design guide (https://swift.org/documentation/api-design-guidelines/), which applies quite well to C++ APIs as well. If you haven’t seen it, it is worth a look.

Using camel case for one thing and snake case for another is too weird IMO.

Even if the community does not have the appetite to do a huge scale renaming of all the things in LLVM, it would be interesting to carve out an exception for new code being written, refactored, or potentially for use in new subprojects.

-Chris

It’s worth noting that at the time, if I recall correctly, I argued against Nick’s proposal. I’m since come around to see the wisdom of his position and agree that we should have done it.

-Chris

IIRC the last time around there was strong agitation to make function names and variable names look different, which made no sense to me because their contextual syntax makes it blatantly obvious whether a name is a function or a variable. The only possible situation where you could be confused is assigning a function to a function-pointer, which in typical C++ basically you never do that.

I’d much rather have a convention that did not treat all “variables” the same. Instead, distinguish variable names based on scope/lifetime; for example, it’s common practice (outside of LLVM) for class data members to have a different/variant convention (trailing underscore, leading “m_”) because that data all has a scope/lifetime far beyond the current method. Local variables and function parameters, by contrast, have comparatively limited scope. With no immediate visual clue to the difference, I find myself spending a fair amount of time paging around the source and trolling through headers trying to work out which is what. That is: I find the current convention not very readable.

So, I’m quite happy to have camelCase for variables, but would also like some other distinction within the universe of variable names.

Of course any change to existing conventions would cause some churn and confusion, but we already have that, as the current nominal conventions aren’t in place in the entire codebase.

Regarding snake_case, LLVM has very limited use of that, and only for things that are (a) substitutes or (b) extensions of STL features. “iterator_range” for example. camelCase predominates.

–paulr

I'd much rather have a convention that did not treat all "variables" the same. Instead, distinguish variable names based on scope/lifetime;

While it's not a very strong opinion, I'm actually fond of the fact that they're the same.

The reason is that when you're implementing passes, whether a variable is local to the current method or a member of the class is really not a very meaningful distinction, and it's convenient to be able to change the scope occasionally as part of refactorings.

IDEs are very good at jumping to variable declarations if there's any doubt.

Cheers,
Nicolai

I so agree. I have found scope based coding conventions very useful. My favorite was:

· Static data member: s_

· Non-static data members: _ (This was allowed by the C++ standard I last read. It’s _ that is reserved)

· Function argments: _

· Function local variables:

· Class/Struct identifiers:

Even “smart” IDE’s can become confused – I’ve tried several (all “free”, open source) on the LLVM code base, and they’ve all failed, either finding the wrong thing (that has the same identifier) or not finding the thing (when it most definitely does exist). If someone knows of a (“free”) IDE that does not get confused by the LLVM code base, please, please advise?

Many thanks!

This is about the one thing I'd be truly unhappy to see us adopt (for
any situation). I think the interaction with acronyms is just too
pathological. You get a really weird identifier or UB, possibly
without even knowing it.

Tim.

Hi Tim,

Sorry, I'm not sure I follow. Are you maybe thinking that if the identifiers were tagged to specify scope, people would still be trying to use acronyms or single letters? So that, what in future code might be F, would instead be _f and that would be worse than f_ or s_f? I was thinking instead F would be (for new or modified code) _function or _fnctn or _func (as an object of type Function). Again, sorry -- I don't see how prepended underscore is worse than an appended one. Could you supply some examples, please?

I think it’s more important to distinguish types and variables. Variables should be lowercase and distinguished from uppercase types. For some reason it’s allowed to shadow types with variable names, so we end up with quite a bit of code using variable names like “Value” which is also a class. This is made worse by the fact that lldb doesn’t recognize this, so when you try to print “Value” it’s confused since that’s a type. I’ve wasted a lot of time renaming variables when debugging just to avoid this problem. I assume this has never been fixed since pretty much every other naming convention doesn’t uppercase variables.

-Matt

In conventional English usage, acronyms always use upper-case. All
other coding conventions deal with that situation gracefully (or at
least not terribly). If someone strictly follows the coding convention
you might end up with a weird identifier (m_tlaThatDoesSomething,
m_tLAThatDoesSomething, ...); if they favour English over coding you
get (m_TLAThatDoesSomething).

The leading underscore is unique in turning that last case into
something that violates the language standard, and for me that's
enough to eliminate it from contention. I strongly discourage its use
in any C or C++ project.

Tim.

If _<lowerCaseLetter> violates a standard, please say which one. It does not violate the C++11 standard:

•Reserved in any scope, including for use as implementation macros:
•identifiers beginning with an underscore followed immediately by an uppercase letter
•identifiers containing adjacent underscores (or "double underscore")

•Reserved in the global namespace: •identifiers beginning with an underscore

•Also, everything in the std namespace is reserved. (You are allowed to add template specializations, though.)

If strictly adhered to, it doesn't, and I've never claimed any
different. But coding standards are never strictly adhered to.
Particularly not in a codebase like LLVM which already has a good
handful in play for historical reasons. We can't expect reviewers to
be perfect either, and violations of a leading underscore rule have a
ridiculously high probability of producing malformed C++.

It's simply not worth the aggro when there are plenty of other
possibilities available that don't open us up for that failure mode.

Tim.

IMO, any convention that contains leading or trailing underscores should be rejected outright. The primary purpose of a convention is to allow a person to differentiate between different kinds of elements with a quick glance. It should strive to make these elements appear different without sacrificing the readability. Prepending or appending a lone underscore is really making the identifier as similar to another one as possible, while still making it different from the language standard point of view, i.e. the opposite to what a useful convention should do.

-Krzysztof

Yeah, I hated it too at first. It grew on me. After about a week or so, it does make it easy to differentiate between different kinds of elements at a quick glance, can be more readable than a single or two character identifier, doesn't take up much horizontal space (considering LLVM allows only 80, that should be a consideration...unless that, too, is under discussion?) can be very pronounceable, and when documentation is created from it, allows the reader of that documentation to quickly differentiate between different kinds of elements as well.

If we’re talking about member variables, just put an m in front of it, problem solved. You already have one for s_, and I didn’t see you mention it but I assume you’d want g_ for globals, so m_ makes perfect sense for member variables and there’s no question about UB at that point.

GCC does a very similar thing with s_, m_ etc.
https://www.gnu.org/software/gcc/codingconventions.html

<<Joke Follows; please do not flame>> “m_” stands for “Microsoft-wannabe”

Thanks for all your thoughts. It sounds like there's general agreement that the current naming rule for variables is not ideal.

I've submitted a patch, which I hope implements the least controversial change of switching to lowerCamelCase: https://reviews.llvm.org/D57896 - please review.

I don't want to alienate everyone who's been carefully adhering to the current rule, so the new rule also allows using UpperCamelCase. It would be nice if we could express this in .clang-tidy but I can't see a way to do that, other than just using aNy_CasE.

Cheers,
-Michael