[RFC] Allowing _Nonnull etc on smart pointers

TL;DR: proposal to allow _Nonnull attributes on C++ smart pointers: unique_ptr, shared_ptr, and anything marked [[gsl::Pointer]]. Draft patch

@DougGregor as author of _Nonnull and friends. @erichkeane as code-owner for attributes.


Background: nullability attributes

It’s 2024, and null pointer crashes are still everywhere, often due to confusion of whether a pointer is allowed to be null. Clang supports the nullability attributes _Nullable and _Nonnull[1] to explicitly specify whether null is considered a valid value:

void Print(Document* _Nonnull doc, Font* _Nullable custom_style);

These serve several purposes:

  • they document the intended contract of an API in a regular way
  • clang statically detects simple violations (-Wnonnull, -Wnullable-to-nonnull-conversion)
  • UBSan dynamically detects violations (-fsanitize=nullability)
  • clang-based tools consume the annotations to better understand the API (e.g. Swift/Objective-C interop, crubit nullability checker)

Nullability and smart pointers

Today these attributes are only allowed on built-in “raw” pointer types. In C++, many of the pointer types in APIs are instead instances of smart-pointer classes.

std::shared_ptr<Scheduler> _Nonnull getDefaultScheduler();
// error: nullability specifier `_Nonnull` cannot be applied to non-pointer type 'std::shared_ptr<Scheduler>'

We’d like to allow these attributes there too, as all the benefits of annotating raw pointers apply to smart pointers too. The basic design of _Nonnull as a type attribute applies equally well to smart pointers.


Identifying smart pointer types

Not every type is nullable: int _Nullable x; is nonsense that we should continue to diagnose. So we need to identify the pointer-like types we do support.

The most important smart pointers are unique_ptr and shared_ptr from the C++ standard library. We would hard-code these names. I’m not sure about std::function and would conservatively leave it out.

User-defined smart pointers are common too. Clang already knows about the [[gsl::Pointer]] attribute for pointer-like objects, and I suggest we accept all classes marked with that as pointers.


Details: UBSan

To have -fsanitize=nullability check smart pointer arguments and return values, we need to teach it how to check the nullness of a smart pointer value.

The most obvious thing is to find operator bool() on the pointer class and synthesize a call to it. If such an operator doesn’t exist, we wouldn’t do any checking.

I’d leave it out of the initial scope though, to keep things small and because it needs slightly different expertise to write/review.


Details: pragma and completeness

Ideally, all pointers are annotated, most pointers are non-null, and we should limit the noise added by annotations. Clang has features to support this:

  • The #pragma clang assume nonnull directive marks pointers as non-null by default within a region of code.
  • When only a subset of pointers in a header are annotated, -Wnullability-completeness will warn.

However I don’t think we should apply these features to smart pointers, at least for now:

First: because it would change the meaning of existing programs.
Second: it’s unclear how useful the pragma is for C++ code. It attaches _Nonnull to parts of declarations in a helpful but somewhat at-hoc way, and ignores typedefs. This makes sense in C/Obj-C, but in C++ regularity and composition of types become more important.

We’d like to leave this out for now, continue to experiment with pragmas out-of-tree, and revisit later.


Draft patch: [clang][nullability] allow _Nonnull etc on gsl::Pointer types by sam-mccall · Pull Request #82705 · llvm/llvm-project · GitHub

This allows the attribute for the chosen smart pointer types, and extends both the on-by-default nullptr => non-null warnings and the off-by-default nullable => non-null warnings to work with smart pointers.


Background on our work

We’ve been working on a static nullability checker and inference tool based on the clang dataflow framework. This has happened outside of llvm-project in part so we can first get deployment experience to confirm the approach works.

We’re rolling this out to our internal C++ codebase, where we’ve chosen to spell these types absl::Nonnull<int*> etc. Because of the lack of smart pointer support, these aliases currently expands to approximately[2] int * [[clang::annotate("nonnull")]] rather than int * _Nonnull.[3]

If these attributes are allowed on smart pointers, we can switch to them. This means we can benefit from nullability support in Sema and UBSan, and help extend UBSan to sanitize smart pointers. It also removes barriers to upstreaming our checker.

(“We” is @bazuzi @gribozavr @martinboehme @ymand and myself).


[1] There’s also _Null_unspecified and _Nullable_result, which are similar but less interesting, and [[gnu::nonnull]] which I’m not proposing to touch.

[2] In fact annotate goes on the type alias rather than the type itself, this just saves a few AST nodes.

[3] We can’t include _Nonnull for raw pointers only, because specializing absl::Nonnull<T> for different T inhibits template argument deduction.

2 Likes

How does this compare to gsl::not_null, or other approaches that don’t involve a compiler extension?

gsl::non_null is a distinct canonical type that wraps a pointer, while T _Nullable is sugar for T.

I think gsl is a useful approach (I don’t have a lot of experience with the ergonomics). However we concluded it’s not one we can migrate Google’s codebase to in the near future.

The migration story is very different - local changes can break code at a distance, and you have less flexibility about things being hard vs soft errors, doing local nullability inference within a function etc.

Given the trade-offs and that _Nullable is supported by clang already, I think that these should coexist.

This would potentially be very useful for our Swift/C++ interoperability efforts, it would allow us to provide better support for custom smart pointer types :+1:

Thank you for this RFC! In general, I think having something along these lines makes a lot of sense. However, I don’t think we should be hardcoding a list of allowed smart pointer types from the standard library. That runs into problems with things like:

template <typename T>
struct my_awesome_smart_pointer : std::shared_ptr<T> {
  ...
};

or other kinds of uses. I was thinking that perhaps we could look for publicly accessible operator* and operator-> in the class hierarchy, but then (e.g.) iterators become nullable (and that seems a bit odd to me). But perhaps there is some other heuristic we can use? I mostly would like to avoid doing ad hoc name lookup for special library functions; that’s going to get out of sync with reality. For example, weak_ptr is missing from the list, but also, someday C++ may get something like observer_ptr (etc) and we’ll have to remember to come back to this list to update it.

FWIW, Aaron’s response is a result of a discusison between he and I, so it is my opinion too.

I think some level of heuristic is the way forward, and I don’t think the GSL::Pointer attribute is sufficient here, so some level of attempt to identify what IS a pointer would be the way forward for me.

I don’t necessarily agree with Aaron about the iterator cases though, iterators effectively model pointer semantics (or at least, a pointer is often USED as an iterator, so many iterators are replacements for using a pointer), so I don’t think that is a deal-breaker for that heuristic for me.

But I would like to see some level of discussion/attempt to make AN heuristic work over the current algorithm.

Yeah, I’ve been thinking about this more and std::optional and iterators kind of are pointer-like enough that maybe this does make sense. A pointer is an object from which you can read/write a value through an indirection. An iterator or optional object are both similar in that regard.

How to identify pointer types is an important question for sure, thanks for raising it! (We did think a bunch about this and try some options, but may not have hit the right answers).


Almost-pointer-like types

std::optional has a pointer-like API, but I don’t think we should conisder optional as a nullable pointer type. Our main goal with the explicit _Nullable attribute is to disambiguate types written in APIs, and optional<int> isn’t really ambiguous.
(Static analysis for nullability of optionals is useful and overlaps with pointer nullability a lot - it’s just the attribute that is IMO not helpful).

Iterators are interesting: a one-past-the-end iterator could be considered null. In practical terms I’m not sure this is valuable. In production we don’t see a lot of one-past-the-end iterator derefs that static analysis would catch. Spelling this _Nullable is non-obvious, and as an ad-hoc concept, identifying iterators is challenging. (Iterator invalidation bugs are reasonably common, but require very different static analysis and a type attribute doesn’t help much here).

We’d also like to be able to eventually use a pragma to say “pointers are by default non-null for this file”, this works best if we keep the definition of pointers fairly narrow (optional and iterators would rarely want this).

std::function is a case that looks much closer and maybe should be considered a pointer. weak_ptr wasn’t an oversight: you can’t tell statically whether a weak_ptr is null in most interesting cases. weak_ptr::lock() should be a shared_ptr<T> _Nullable.


Opt-in vs heuristics for user-defined pointer types

I’m not sure heuristics will work well, but we should still try to come up with the strongest candidate to evaluate concretely.

Getting a precise heuristic is pretty hard: operator* etc isn’t enough as we’ve seen people using them for types that can never be null. We can get more sophisticated by considering operator bool, constructors from nullptr_t etc (though overload checking makes this hard). The more complex our heuristics are, the harder they are to explain.
We have types that would pass any reasonable heuristic, but are idiosyncratic enough that we’d rather not allow nullability on until some cleanup has been made.

There’s also a harder problem: heuristics don’t work on incomplete types. We’ve seen both of the following in practice:

  • the library provides a forward-declaration for the pointer type (i.e. both myptr.h and myptr_fwd.h). Even in our codebase where forward-declarations are discouraged, these types are so often used in APIs that it was deemed worthwhile.
  • Ptr _Nullable is used within the definition of Ptr, e.g. a method static Ptr _Nullable MakeFrom(...);.

Recognizing std types

For our (Google) purposes, we need this to work for libc++ at ~head. So any of these are fine for us: libc++ accepts an attribute patch, or we use a heuristic, or we hard-code at least std::{unique_ptr, shared_ptr}. (An approach that doesn’t recognize std’s smart pointers is of no practical value to us :slight_smile: )

If we require attributes on std types, then this is less useful to people using libstdc++ or older libc++. (Happy to send a libstdc++ patch, they may or may not accept it as they don’t recognize gsl::Pointer today).

Components of a heuristic:

  • Dereference: require operator*/operator->, AFAIK these are always defined together in practice. This bans std::function, so it forces us to a pretty narrow definition of pointer. It allows std::optional, so we need additional criteria.
  • get() returning raw pointer: this is a widely observed convention. Many dereferencable types that aren’t really pointers don’t define this, like std::optional. Non-nullable smart-pointer types like gsl::non_null still do, though.
  • operator bool(): this is often a nullability test.
  • Constructability from nullptr_t: this is a fairly clear indication of nullability, even for non-pointer types like std::function. Constructor overload sets may be metaprogrammed, in practice we may need to run overload resolution. Excludes pre-c++11 types like auto_ptr, which is probably OK.
  • Comparison with nullptr_t: this is clearly a nullability test. But pointer types P might rely on converting nullptr_t to P and comparing against P.

AFAICS the best alternatives are:

  • narrow (excludes function): operator-> exists, get() returns raw pointer, constructible from nullptr_t
  • broad (includes function): constructible from nullptr_t
  • extremely broad (includes optional): operator-> exists

I think either the narrow or broad definition would work OK.

The issues with incomplete types remain, and these are a significant downside for us. Maybe we can get away with “failing open” by allowing _Nullable on incomplete types? That’s going to create some surprises occasionally, but could be a good tradeoff if we strongly prefer heuristics.

Am I misunderstanding how _Nonnull works? I thought that it gets applied to a declaration? And a declaration cannot have an incomplete type?

_Nonnull is a type attribute, used as vector<unique_ptr<int> _Nonnull> x etc.

It is used as an incomplete type in examples like:

// arena_fwd.h
class Arena;
template <class T> class [[gsl::Pointer]] ArenaUniquePtr;

// foo_factory.h
#include "arena_fwd.h"
#include "foo.h"
ArenaUniquePtr<Foo> _Nonnull makeFoo(Arena&);
1 Like

The more I think about this, “nullable” and “pointer” seem like separate questions, as shown by std::function (nullable but not pointer) and gsl::not_null (pointer but not nullable).

It seems clear to me that _Nullable should apply to nullable things, not pointer things, even if the analysis that uses it cares about pointers.

So if we use an attribute, it probably shouldn’t be [[gsl::Pointer]], and if we use a heuristic, it should probably involve nullptr_t in some way.

2 Likes

The threads are quite sprawling, but previous (and more ambitious) proposals in this direction might contain relevant considerations. Just for reference…

1 Like

It would be reasonable to have an attribute that opts in to allowing the nullability annotations. That would be the ideal solution except that getting the annotation into C++ stdlib headers would probably be a huge pain.

I agree with your language-design sense that we want this on types with ambiguous nullability — smart pointers, std::function, and so on — and that std::optional isn’t really ambiguous in that way. On the other hand, it’s not really a problem to allow it on std::optional; people will probably just not use the annotation on their std::optional types, and that will be fine. I guess the only thing is that we definitely wouldn’t want to infer _Nonnull for them under the pragma.

1 Like

FWIW, I don’t think we should rely on gsl attributes because those are “owned” by the C++ Core Guidelines and thus we may run into issues if that document changes in a way that’s incompatible with our extra uses of the attribute.

+1 to both points. Some folks may have a mental model where optional is sort of a nullable thing (consider an empty optional similar to an undereferencable pointer) while others may not. So extra flexibility may not be entirely bad, but I do agree about the concerns with pragma usage, especially if we go with a heuristic.

+1, that’s another good reason to choose a different attribute, if we use one.
What about reusing the existing _Nullable: applying it to a class declaration C would mean “instances of C can be conceptually null” and therefore it makes sense to talk about types C _Nullable, C _Nonnull.

While there I’d also suggest adding the spelling [[clang::nullable]] - I don’t think any of the historical/readability reasons for the nonstandard _Nullable spelling apply to class declarations.


Regarding heuristics and incomplete types:

Maybe we can get away with “failing open” by allowing _Nullable on incomplete types?

I was a bit uncomfortable with this. @martinboehme pointed out some specific technical problems this causes with nullability analysis. Not insurmountable and may need solving anyway, but one reason for us to slightly prefer an attribute.

Messy details

In nullability analysis involving complicated types like pair<unique_ptr<int>, vector<int*>>, we need a data structure to encode a type’s nullability, so that the (canonical type, nullability) pair is enough to understand nullability contracts.

That representation should be simple (it gets manipulated a lot) and easily serializable for cross-TU analysis. We use a list of NullabilityKind with an entry for each “nullable slot” within the type. If different points in the code may see different slots due to completeness/incompleteness of types, this detail ends up becoming quite intrusive to this data structure and the code that manipulates it.

I should mention that attributes have the same issues if users add forward declarations that are missing the attribute. So it’s possible we have to deal with this anyway.


Another side point about heuristic-vs-annotation - if someone wants to standardize [[nullable]] in future, we might consider which has an likelier path to standardization. My guess is an attribute, but I don’t know. (Standardization of this is not important to me personally, but I know it’s something Clang cares about)


Thanks for digging these up!
The RFC for the nullability qualifiers is the feature that we’d be extending here. I hope and believe we’re doing so in a coherent way. It gives a good motivation for the slightly-irregular #pragma behaviour that I don’t think translates well to C++, so it reinforces my belief we should punt on this for now.

The ISO C3X proposal is very similar to _Nonnull + UB when violated. For our use case of incremental adoption, we want to guarantee no change in behavior except under sanitizers, so _Nonnull has exactly the right contract. (I can imagine extending it with some mechanism to make these annotations load-bearing; that’s independent to the extension we’re proposing here). This proposal is for C and doesn’t address user-defined types. It doesn’t address explicit-nullable or file-level control, which we think are important to migration (same arguments as the _Nullable RFC). If this had been standardized we should consider bending on some of these points, but it seems to have gone nowhere.

The _Optional proposal does not add explicit-nonnull which is the most important contract to statically check. It adds pointer conversion rules to the language that inhibit backwards-compatibility of local changes that we need. It addresses C only, and its int _Optional * syntax requires a mental model that seems dubious for C and certainly does not fit how C++ developers think about types. I don’t think this proposal has much to offer us.

Here’s my attempt to summarize the discussion and refine the proposal:

  • there seems to be broad support to generalize _Nonnull and friends to C++ class types
  • the main open question is whether to detect which classes could be null based on their API or based on an explicit attribute
  • we should focus on the question “can this type be null” rather than “is this type a pointer”
  • I haven’t heard any concerns about leaving #pragma support and UBSan instrumentation as later work

So here’s an API-based proposal:

  • T _Nullable etc is now allowed if T is a class with a constructor that takes a single parameter of type nullptr_t
  • T _Nullable etc is now allowed if T is an incomplete class type
  • from these rules it follows that std::unique_ptr<T> _Nullable etc is now allowed

And here’s an attribute-based proposal:

  • we add alternate spellings [[clang::nullable]], [[clang::nonnull]] etc for the existing attributes
  • a class declaration can be marked nullable: class [[clang::nullable]] C;
  • T _Nullable etc is now allowed if T is a class where any visible declaration has the nullable attribute
  • we work with libc++ and libstdc++ to get the attributes added to standard library pointer types. Failing that, we may special-case the names in clang.

I think either proposal is implementable and serves our needs.
I have a slight preference for attributes. I’d like to know what others think is acceptable/preferred or if we should look at more options.

Cheers, Sam

1 Like

FWIW, my preference is:

  • An explicit attribute to tag nullable types and their forward declarations; no heuristics to detect nullable types.

  • Clang should implicitly infer the same attribute on declarations of nullable types in the standard library, using a hardcoded list of well-known nullable types. That is, the declaration namespace std { class unique_ptr {}; } automatically gets the “this is a nullable type” attribute and the std::unique_ptr type becomes eligible to use nullability attributes.

Here’s my reasoning:

  • A heuristic will either be simple, then it will be wrong often enough to be error-prone, or it will be complex, and then we are better off having a simple attribute.

  • An explicit attribute can be cleanly applied to forward declarations of user-defined nullable types, allowing incomplete types to be tagged with _Nullable / _Nonnull / be influenced by the region-based pragma.

  • A hardcoded allowlist for standard APIs has a lot of precedent. The compiler is already assuming specific semantics in for libc functions (for example, memcpy). Lifetime warnings already use a hardcoded list of C++ standard library types to automatically attach [[gsl::Pointer]] to them. ClangTidy often special-cases standard library types. There are more examples where Clang assumes certain semantics for well-known standard APIs.

I think we could have both an attribute and a tidy check that would use heuristics to automatically apply attributes to some types.

1 Like

I just wanted to second Dmytro’s opinion – preference for attribute approach.

Full disclosure: I’m on the same Nullability-checker project as Dmytro and Sam. :slight_smile: