RFC: variable names

I’d like to discuss revising the LLVM coding conventions to change the naming of variables to start with a lowercase letter. This should not be a discussion on the pain of such a transition, or how to get from here to there, but rather, if there is a better place to be.

My arguments for the change are:

  1. No other popular C++ coding style uses capitalized variable names. For instance here are other popular C++ conventions that use camelCase:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
http://www.c-xx.com/ccc/ccc.php
http://geosoft.no/development/cppstyle.html

And, of course, the all-lower-case conventions (e.g. C++ ARM) don’t capitalize variable names. In addition, all the common C derived languages don’t use capitalized variable names (e.g. Java, C#, Objective-C).

  1. Ambiguity. Capitalizing type names is common across most C++ conventions. But in LLVM variables are also capitalized which conflates types and variables. Starting variable names with a lowercase letter disambiguates variables from types. For instance, the following are ambiguous using LLVM’s conventions:

Xxx Yyy(Zzz); // function prototype or local object construction?
Aaa(Bbb); // function call or cast?

  1. Allows name re-use. Since types and variables are both nouns, using different capitalization allows you to use the same simple name for types and variables, for instance:

Stream stream;

  1. Dubious history. Years ago the LLVM coding convention did not specify if variables were capitalized or not. Different contributors used different styles. Then in an effort to make the style more uniform, someone flipped a coin and updated the convention doc to say variables should be capitalized. I never saw any on-list discussion about this.

  2. Momentum only. When I’ve talked with various contributors privately, I have found no one who says they likes capitalized variables. It seems like everyone thinks the conventions are carved in stone…

My proposal is that we modify the LLVM Coding Conventions to have variable names start with a lowercase letter.

Index: CodingStandards.rst

+1, leaving aside all practicalities of migration

I’d like to discuss revising the LLVM coding conventions to change the
naming of variables to start with a lowercase letter.

Almost all of your negatives of the current conventions also apply to your
proposed convention.

Type names: CamelCase
Function names: camelCase
Variable names: ???

If we name variables in camelCase then variable names and function names
collide.

If we are going to change how we name variables, I very much want them to
not collide with either type names or function names. My suggestion would
be "lower_case" names.

This also happens to be the vastly most common pattern across all C++
coding styles and C-based language coding styles I have seen.

I’d like to discuss revising the LLVM coding conventions to change the naming of variables to start with a lowercase letter. This should not be a discussion on the pain of such a transition, or how to get from here to there, but rather, if there is a better place to be.

  1. Momentum only. When I’ve talked with various contributors privately, I have found no one who says they likes capitalized variables. It seems like everyone thinks the conventions are carved in stone…

Personally, I think that lower case local variables are the way to go. I could see it either way for instance variables, but being lower case is consistent.

-Chris

Fair point.

I haven’t seen confusion here in practice. The problem between types and variables is that they are often both nouns. Functions are usually verbs, and calls almost always have parentheses, which makes usage unambiguous.

Ick. :slight_smile:

-Chris

I think we're going to get more and more confusion here due to lambdas.

If we are going to change how we name variables, I very much want them to
not collide with either type names or function names. My suggestion would be
"lower_case" names.

This also happens to be the vastly most common pattern across all C++ coding
styles and C-based language coding styles I have seen.

STL has "lower_case" functions, and exposes far fewer variables. I
can't really recall which of myFunc/my_var or my_func/myVar I've seen
more elsewhere though.

Tim.

(Not advocating anything in particular yet).

One side note:

Yes. That is my intention.

-Nick

I’d like to discuss revising the LLVM coding conventions to change the naming of variables to start with a lowercase letter. This should not be a discussion on the pain of such a transition, or how to get from here to there, but rather, if there is a better place to be.

My arguments for the change are:

  1. No other popular C++ coding style uses capitalized variable names. For instance here are other popular C++ conventions that use camelCase:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
http://www.c-xx.com/ccc/ccc.php
http://geosoft.no/development/cppstyle.html

And, of course, the all-lower-case conventions (e.g. C++ ARM) don’t capitalize variable names. In addition, all the common C derived languages don’t use capitalized variable names (e.g. Java, C#, Objective-C).

  1. Ambiguity. Capitalizing type names is common across most C++ conventions. But in LLVM variables are also capitalized which conflates types and variables. Starting variable names with a lowercase letter disambiguates variables from types. For instance, the following are ambiguous using LLVM’s conventions:

Xxx Yyy(Zzz); // function prototype or local object construction?
Aaa(Bbb); // function call or cast?

  1. Allows name re-use. Since types and variables are both nouns, using different capitalization allows you to use the same simple name for types and variables, for instance:

Stream stream;

  1. Dubious history. Years ago the LLVM coding convention did not specify if variables were capitalized or not. Different contributors used different styles. Then in an effort to make the style more uniform, someone flipped a coin and updated the convention doc to say variables should be capitalized. I never saw any on-list discussion about this.

  2. Momentum only. When I’ve talked with various contributors privately, I have found no one who says they likes capitalized variables. It seems like everyone thinks the conventions are carved in stone…

My proposal is that we modify the LLVM Coding Conventions to have variable names start with a lowercase letter.

+1

> If we are going to change how we name variables, I very much want them to
> not collide with either type names or function names. My suggestion
would be
> "lower_case" names.
>
> This also happens to be the vastly most common pattern across all C++
coding
> styles and C-based language coding styles I have seen.

STL has "lower_case" functions, and exposes far fewer variables.

STL also has "lower_case" types. The STL naming convention is very simple:
lower_case everywhere. I'm actually fine with this, and even understand
some of the reasons it makes sense for the STL (what is a type? or a
function? they're really interchangeable in the STL in many cases).

But I also really appreciate why most coding standards I have seen
advocated try to use distinguished naming conventions for these things to
make it easier to tell at a glance what things are what. And I suspect this
kind of optimization for skimming and rapid comprehension is the correct
way for LLVM to structure its style. We just don't do enough generic
programming to make it terribly important to make names consistent between
functions, variables, and types.

I can't really recall which of myFunc/my_var or my_func/myVar I've seen

more elsewhere though.

I think using underscores for function names would cause a moderate to
extreme degree of chaos in the APIs of LLVM. It doesn't really seem worth
considering IMO, but if others really want to advocate for it, carry on.

My position is that trading one set of collisions for another set of
collisions is a poor tradeoff. I would much rather trade for no collisions.

I actually have a particular allergy to member variable names and function
names having similar styles:

bool x = i->isMyConditionTrue;

Did I mean to write 'isMyConditionTrue()'? Or 'bool &x =
I->isMyConditionTrue'? Or something else? I have no idea. Warnings and
other things can help reduce the likelihood of this becoming a live bug,
but it still makes the code harder to read IMO.

This is exactly why I was making the wishy-washy statement about instance variables. This is the pattern that I tend to prefer:

class Something {
  bool IsMyConditionTrue;

  bool isMyConditionTrue() const { return IsMyConditionTrue; }
}

If you make instance variables be lower camel case, then you get serious conflict between ivars and methods. Doing this also deflates some of the ammunition used by advocates of _ for ivars :slight_smile:

-Chris

Agreed.

I am also fine with:

class Something {
  bool is_my_condition_true;

...

  bool isMyConditionTrue() const { return is_my_condition_true; }
};

I think it has the same lack of ambiguity. Its somewhat nicer that there is
no type name conflict as well, but honestly that's a smaller gain IMO.

I’d like to discuss revising the LLVM coding conventions to change the
naming of variables to start with a lowercase letter.

Almost all of your negatives of the current conventions also apply to your
proposed convention.

Type names: CamelCase
Function names: camelCase
Variable names: ???

If we name variables in camelCase then variable names and function names
collide.

If we are going to change how we name variables, I very much want them to
not collide with either type names or function names. My suggestion would
be "lower_case" names.

I think this would be bad:

  function();
  lambda();
  longFunction();
  long_lambda();

... but possibly not in practice, since function names rarely have only one
word.

A partial-camel-case, partly-underscores convention sounds strange to me.
(I don't find this to be problematic for BIG_SCARY_MACROS and for
ABCK_EnumNamespaces because the former are rare and in the latter case the
underscore isn't a word separator, it's a namespace separator.) We have a
few people here who are used to such a style (since it's what the Google
style guide and derivatives uses); any useful feedback from that experience?

Some arguments against the change as proposed:

1. Initialisms. It's common in Clang code (also in LLVM?) to use
initialisms as variable names. This doesn't really seem to work for names
that start with a lower case letter.

2. The ambiguity introduced might be worse than the one removed. It's
usually easy to see if a name is a type or variable from the context of the
use. It's not so easy to see if a name is a function or a variable,
especially as more variables become callable due to the prevalence of
lambdas.

This also happens to be the vastly most common pattern across all C++

coding styles and C-based language coding styles I have seen.

This should not be a discussion on the pain of such a transition, or how
to get from here to there, but rather, if there is a better place to be.

My arguments for the change are:

1. No other popular C++ coding style uses capitalized variable names.
For instance here are other popular C++ conventions that use camelCase:

   http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

This does not use camelCase for variable names.

   http://www.c-xx.com/ccc/ccc.php

   http://geosoft.no/development/cppstyle.html

And, of course, the all-lower-case conventions (e.g. C++ ARM) don’t
capitalize variable names. In addition, all the common C derived languages
don’t use capitalized variable names (e.g. Java, C#, Objective-C).

Some or all of those other conventions don't capitalize *any* names (other
than perhaps macros), so we're not going to become consistent with them by
making this change.

2. Ambiguity. Capitalizing type names is common across most C++

conventions. But in LLVM variables are also capitalized which conflates
types and variables. Starting variable names with a lowercase letter
disambiguates variables from types. For instance, the following are
ambiguous using LLVM’s conventions:

Xxx Yyy(Zzz); // function prototype or local object construction?
Aaa(Bbb); // function call or cast?

3. Allows name re-use. Since types and variables are both nouns, using
different capitalization allows you to use the same simple name for types
and variables, for instance:

Stream stream;

4. Dubious history. Years ago the LLVM coding convention did not specify
if variables were capitalized or not. Different contributors used
different styles. Then in an effort to make the style more uniform,
someone flipped a coin and updated the convention doc to say variables
should be capitalized. I never saw any on-list discussion about this.

FWIW, I thought the argument for the current convention was: capitalize
proper nouns (classes and variables), do not capitalize verbs (functions),
as in English. Though maybe that's just folklore.

5. Momentum only. When I’ve talked with various contributors privately, I

have found no one who says they likes capitalized variables. It seems like
everyone thinks the conventions are carved in stone...

Momentum is an argument against the change, not in favour of it: this
change has a re-learning cost for everyone who hacks on LLVM projects.
(Your point that no-one seems to like capitalized variables is valid, but
generally people are opposed to change too.)

I would add:

6. Lower barrier to entry. Our current convention is different from almost
all other C++ code, and new developers *very* frequently get it wrong.

My proposal is that we modify the LLVM Coding Conventions to have variable

I agree.

I agree. No collisions is even better.

I purposefully did not discuss data member names in my RFC because as I found in the lld conventions discussions, that LLVM’ers seemed shocked by having different naming conventions for data members than local (stack) variables.

-Nick

I think this would be bad:

  function();
  lambda();
  longFunction();
  long_lambda();

... but possibly not in practice, since function names rarely have only
one word.

A partial-camel-case, partly-underscores convention sounds strange to me.
(I don't find this to be problematic for BIG_SCARY_MACROS and for
ABCK_EnumNamespaces because the former are rare and in the latter case the
underscore isn't a word separator, it's a namespace separator.) We have a
few people here who are used to such a style (since it's what the Google
style guide and derivatives uses); any useful feedback from that experience?

This has never come up as a practical problem in my time at Google. Or at
least, if it has, it was so rare and long ago that I can't remember it. I
don't expect it to be a problem in practice. Mostly that is because all of
the problematic cases have two words in them, with one of the words often
being "is" or a related obvious verb like "get", "create", etc.

Some arguments against the change as proposed:

1. Initialisms. It's common in Clang code (also in LLVM?) to use
initialisms as variable names. This doesn't really seem to work for names
that start with a lower case letter.

I think wee at least need a good answer to this.

FWIW, I think that having different naming conventions for data members and
local variables has become essentially untenable with lambdas and capture.

class Foo
{
     int mBar;

public:
     void bar(int aBar) {
         mBar = aBar;
     }

     void foo(int aFoo) {
         // compiler complains -- is this *really* the end of the world?
         // Just pick another name...
         int bar = aFoo;
     }
};

"mName" is perfectly unambiguous, "aName" for arg names allows use of essentially the same name in methods without being obnoxious, and none of it involves underscores (which sets my teeth on edge, speaking personally).

Greg

> I actually have a particular allergy to member variable names and
function names having similar styles:
>
> bool x = i->isMyConditionTrue;
>
> Did I mean to write 'isMyConditionTrue()'? Or 'bool &x =
I->isMyConditionTrue'? Or something else? I have no idea. Warnings and
other things can help reduce the likelihood of this becoming a live bug,
but it still makes the code harder to read IMO.

This is exactly why I was making the wishy-washy statement about instance
variables. This is the pattern that I tend to prefer:

class Something {
  bool IsMyConditionTrue;

  bool isMyConditionTrue() const { return IsMyConditionTrue; }
}

If you make instance variables be lower camel case, then you get serious
conflict between ivars and methods. Doing this also deflates some of the
ammunition used by advocates of _ for ivars :slight_smile:

trailing or leading _ for ivars seem to be a common practice. What is the
reason it s not used in LLVM?

David

From: "Chandler Carruth" <chandlerc@google.com>
To: "Nick Kledzik" <kledzik@apple.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, October 13, 2014 5:19:31 PM
Subject: Re: [LLVMdev] RFC: variable names

I’d like to discuss revising the LLVM coding conventions to change
the naming of variables to start with a lowercase letter.

Almost all of your negatives of the current conventions also apply to
your proposed convention.

Type names: CamelCase
Function names: camelCase
Variable names: ???

If we name variables in camelCase then variable names and function
names collide.

If we are going to change how we name variables, I very much want
them to not collide with either type names or function names. My
suggestion would be "lower_case" names.

+1

-Hal