Guidance on using pointers vs. references for function arguments

This has been discussed before but I can’t find a reference to it. I could have sworn this was in the coding convention at some point. Here’s what I remember: during early LLVM development there was an effort to establish the convention that you described above—use pointer types only when nullptr is valid. This led to a lot of redundant declarations and annoying taking of addresses and dereferences. It turns out that the convention doesn’t really help for most informal/internal APIs. It’s actually no harder to debug a SIGSEGV than a nullptr check. I also adhered to this convention in a previous project and it never paid off.

Once you begin working on a piece of code you get a feel for which types should be passed as pointers and which should be passed as reference. Then you try to pass types consistently regardless of whether a null input is valid. For example, some types, like the current context, should never be copied or passed by value and are obviously not null. That’s lower overhead in practice forcing callers to convert to a reference whenever we want to skip a null check.

-Andy

FWIW, I disagree.

I much prefer to pass by reference unless there is the expectation of null
inputs. I have never really agreed with the complaints about taking the
address and have never found it to be a burden. I also find the simplicity
of a consistent rule far more appealing than "getting a feel for which
types" should be passed as pointers.

Ironically, using a reference can result in better optimizations by
deleting redundant null checks. While I certainly hope this isn't relevant
to the performance of LLVM, it's still not something to completely
disregard.

Anyways, I've never really seen this become a problem in practice. I'm
pretty happy for folks to use whatever local conventions they want. I'm
usually happy to switch from reference to pointer if I'm hacking some part
of the codebase I don't usually touch and one of the maintainers really
prefers one over the other. It's not a big deal either way.

This last sentence should read: it’s lower mental overhead for the programmer to use the same type consistently rather than worrying about another convention to follow at every call.

I would personally be happy to follow the pointer may be nullptr convention if it were used consistently. I was just trying to reiterate arguments against it that I’d seen w.r.t LLVM codebase, and I don’t see much value in forcing the convention everywhere.

-Andy

This has been discussed before but I can’t find a reference to it. I could
have sworn this was in the coding convention at some point. Here’s what I
remember: during early LLVM development there was an effort to establish the
convention that you described above—use pointer types only when nullptr is
valid. This led to a lot of redundant declarations and annoying taking of
addresses and dereferences. It turns out that the convention doesn’t really
help for most informal/internal APIs. It’s actually no harder to debug a
SIGSEGV than a nullptr check. I also adhered to this convention in a
previous project and it never paid off.

Once you begin working on a piece of code you get a feel for which types
should be passed as pointers and which should be passed as reference. Then
you try to pass types consistently regardless of whether a null input is
valid. For example, some types, like the current context, should never be
copied or passed by value and are obviously not null. That’s lower overhead
in practice forcing callers to convert to a reference whenever we want to
skip a null check.

This last sentence should read: it’s lower mental overhead for the
programmer to use the same type consistently rather than worrying about
another convention to follow at every call.

I would personally be happy to follow the pointer may be nullptr convention
if it were used consistently. I was just trying to reiterate arguments
against it that I’d seen w.r.t LLVM codebase, and I don’t see much value in
forcing the convention everywhere.

Yeah, there's certainly some local consistency that's not worth
overriding just yet - but we do the same thing with function names,
for example (we've still got lots of functions named UpperFirst, try
to keep a single class's member functions consistent amongst
themselves, but when creating new classes we prefer the new convention
of lowerFirst).

But when given the choice (because there's no local consistency to
worry about) I'll pass by reference where possible & suggest others do
the same - but I'm not fussed enough to push this to be in the LLVM
Style guide. Won't complain if it ends up there either, though :slight_smile:

- David

Certainly.

My suggestion: unless we have evidence this is confusing people trying to
contribute to LLVM, then it isn't broken as is.

If folks are confused by this and it is becoming a neusance, then I would
suggest we draw a line in the sand much like we did with function names.
New code gets the new rule, existing code (and code with existing
conventions surrounding it) isn't impacted. My suggested rule would be "use
a pointer if it might be null or re-pointed at some other entity, otherwise
use a reference". But I don't think adding that kind of rule is necessary
unless folks are frequently tripped up by this in reviews, etc.

-Chandler

Chandler, Andy’s recollection is correct.

I was personally infatuated by this idea and pushed it through a ton of the LLVM IR APIs (this was many years ago, (in the pre-1.0 days, and then again early in Clang’s development because I’m a slow learner). It led to all sorts of weird cases where some arguments to IR objects would be taken by reference and some by pointer. It made it impossible to remember whether a given function took a pointer or a reference, and made for absolute madness trying to program against the APIs. It got to the point where literally you had to do a build and just add the missing &/*'s to get the build to go through.

-Chris

I didn't doubt any of Andy's recollection here, and it seemed like we were
largely in agreement...

Anyways, while inconsistency such as you describe between pointer and
reference arguments can be quite frustrating (as I see it was for you), at
least LLVM is still pretty stunningly inconsistent to this day. I also
don't find it to be a significant burden. Especially with Clang (which
corrects for this pretty much perfectly every time) and other tools, when I
have to fix these issues it is very fast.

Ultimately, (as I said in my prior email) I don't think there is a problem
to solve here because I don't think that any of these things are actively
"bad". Maybe its just the devs I know who don't really have trouble here,
but its not one of the complaints I hear from others or have myself in
day-to-day development.

If mixing pointers and references for the same type in LLVM's APIs is a
serious problem in your mind, what is your concrete suggestion? What should
the end state look like?

It is true in parts, but not broadly. If you look at LLVM, for example, pretty much all of the IR stuff takes pointers, even in places where they should “obviously” be references: IRBuilder, ConstantArray::get and other factories, and many others. Likewise pretty much the entire Clang AST library is defined in terms of pointers.

It is true that passes and other local stuff is wildly inconsistent, but the damage of that is far less than affecting the core IR/AST.

I’m not really sure that we need to “solve” this problem. Can you restate exactly what the problem is? We don’t all need the compiler to have perfectly identical code in every pass, for example.

That said, if I were to lay down a rule, I think the right general rule would be: pointers for “classes” everywhere, references for value types, and pass by value when it is >= the efficiency of pass by value for value types.

That said, I still don’t see a huge problem being solved here…

-Chris

That said, if I were to lay down a rule, I think the right general rule would be: pointers for “classes” everywhere, references for value types, and pass by value when it is >= the efficiency of pass by value for value types.

Sorry, I meant:

That said, if I were to lay down a rule, I think the right general rule would be: pointers for “classes” everywhere, references for value types, and pass by value when it is >= the efficiency of pass by reference for value types.

-Chris

We have to be talking past each other. Let me re-quote my email:

"I don't think there is a problem to solve here ..."

And then your email:

"That said, I still don't see a huge problem being solved here..."

We're in agreement about this not being worth laying down rules, even we
disagree about what the rules should be were we to need strict ones.

Chandler, I violently agree with you!

-Chris

Hello Chris and Chandler,

This last sentence should read: it's lower mental overhead
for the programmer to use the same type consistently rather
than worrying about another convention to follow at every
call.

But, the whole point of having this in the Standards is that there is only
one convention to follow -- as opposed to, another convention in every part
of the code!

An infrequent contributor myself, I identify strongly with Chandler's
earlier argument that "the simplicity of a consistent rule [is] far more
appealing than "getting a feel for which types" should be passed as
pointers."

Maybe its just the devs I know who don't really have
trouble here, but its not one of the complaints I hear
from others or have myself in day-to-day development.

I understand that for experienced contributors who work extensively on a
part of LLVM, the "local conventions" can seem natural and unobtrusive; but
for a newcomer, "getting a feel" for such ambiguous, implicit conventions is
just a speed-bump in the way of contributing to the project.

It's fair enough to have the balance between the convenience for veterans
and the convenience for novices shifted in favour of the veterans, but I
just want to point out that this trade-off exists.

(FTR, I've browsed through the revision history of CodingStandards.rst and
its predecessor CodingStandards.html, and I couldn't find any traces of the
previous incarnation of pointers-vs-references guidance.)

For experienced contributors *who stay within one part of the code*, this may be true. For those of us who work on clang, the back end, and anything in between, the cognitive load to work out at any particular level which are the types that are expected to be pointers and which are expected to be references is quite high, especially as there are a number of things that are in the usually-pointers category in some parts of the codebase and the usually-references category in others.

David

For new code, I follow Chris’ guideline. It’s the only effective way I know of reduce this cognitive load: "pointers for "classes" everywhere, references for value types, and pass by value when it is >= the efficiency of pass by value for value types.”

-Andy