RFC: Nullability qualifiers

DougGregor · March 2, 2015, 9:22pm

Hello all,

Null pointers are a significant source of problems in applications. Whether it’s SIGSEGV taking down a process or a foolhardy attempt to recover from NullPointerException breaking invariants everywhere, it’s a problem that’s bad enough for Tony Hoare to call the invention of the null reference his billion dollar mistake [1]. It’s not the ability to create a null pointer that is a problem—having a common sentinel value meaning “no value” is extremely useful—but that it’s very hard to determine whether, for a particular pointer, one is expected to be able to use null. C doesn’t distinguish between “nullable” and “nonnull” pointers, so we turn to documentation and experimentation. Consider strchr from the C standard library:

char *strchr(const char *s, int c);

It is “obvious” to a programmer who knows the semantics of strchr that it’s important to check for a returned null, because null is used as the sentinel for “not found”. Of course, your tools don’t know that, so they cannot help when you completely forget to check for the null case. Bugs ensue.

Can I pass a null string to strchr? The standard is unclear [2], and my platform’s implementation happily accepts a null parameter and returns null, so obviously I shouldn’t worry about it… until I port my code, or the underlying implementation changes because my expectations and the library implementor’s expectations differ. Given the age of strchr, I suspect that every implementation out there has an explicit, defensive check for a null string, because it’s easier to add yet more defensive (and generally useless) null checks than it is to ask your clients to fix their code. Scale this up, and code bloat ensues, as well as wasted programmer effort that obscures the places where checking for null really does matter.

In a recent version of Xcode, Apple introduced an extension to C/C++/Objective-C that expresses the nullability of pointers in the type system via new nullability qualifiers . Nullability qualifiers express nullability as part of the declaration of strchr [2]:

__nullable char *strchr(__nonnull const char *s, int c);

With this, programmers and tools alike can better reason about the use of strchr with null pointers.

We’d like to contribute the implementation (and there is a patch attached at the end [3]), but since this is a nontrivial extension to all of the C family of languages that Clang supports, we believe that it needs to be discussed here first.

Goals
We have several specific goals that informed the design of this feature.

Allow the intended nullability to be expressed on all pointers: Pointers are used throughout library interfaces, and the nullability of those pointers is an important part of the API contract with users. It’s too simplistic to only allow function parameters to have nullability, for example, because it’s also important information for data members, pointers-to-pointers (e.g., "a nonnull pointer to a nullable pointer to an integer”), arrays of pointers, etc.
Enable better tools support for detecting nullability problems: The nullability annotations should be useful for tools (especially the static analyzer) that can reason about the use of null, to give warnings about both missed null checks (the result of strchr could be null…) as well as for unnecessarily-defensive code.
Support workflows where all interfaces provide nullability annotations: In moving from a world where there are no nullability annotations to one where we hope to see many such annotations, we’ve found it helpful to move header-by-header, auditing a complete header to give it nullability qualifiers. Once one has done that, additions to the header need to be held to the same standard, so we need a design that allows us to warn about pointers that don’t provide nullability annotations for some declarations in a header that already has some nullability annotations.
Zero effect on ABI or code generation: There are a huge number of interfaces that could benefit from the use of nullability qualifiers, but we won’t get widespread adoption if introducing the nullability qualifiers means breaking existing code, either in the ABI (say, because nullability qualifiers are mangled into the type) or at execution time (e.g., because a non-null pointer ends up being null along some error path and causes undefined behavior).

Why not attribute((nonnull))?
Clang already has an attribute to express nullability, “nonnull”, which we inherited from GCC [4]. The “nonnull” attribute can be placed on functions to indicate which parameters cannot be null: one either specifies the indices of the arguments that cannot be null, e.g.,

	extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull (1, 2)));

or omits the list of indices to state that all pointer arguments cannot be null, e.g.,

	extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull));

More recently, “nonnull” has grown the ability to be applied to parameters, and one can use the companion attribute returns_nonnull to state that a function returns a non-null pointer:

	extern void *my_memcpy (__attribute__((nonnull)) void *dest, __attribute__((nonnull)) const void *src, size_t len) __attribute__((returns_nonnull));

There are a number of problems here. First, there are different attributes to express the same idea at different places in the grammar, and the use of the “nonnull” attribute on the function actually has an effect on the function parameters can get very, very confusing. Quick, which pointers are nullable vs. non-null in this example?

attribute((nonnull)) void *my_realloc (void *ptr, size_t size);

According to that declaration, ptr is nonnull and the function returns a nullable pointer… but that’s the opposite of how it reads (and behaves, if this is anything like a realloc that cannot fail). Moreover, because these two attributes are declaration attributes, not type attributes, you cannot express that nullability of the inner pointer in a multi-level pointer or an array of pointers, which makes these attributes verbose, confusing, and not sufficiently generally. These attributes fail the first of our goals.

These attributes aren’t as useful as they could be for tools support (the second and third goals), because they only express the nonnull case, leaving no way to distinguish between the unannotated case (nobody has documented the nullability of some parameter) and the nullable case (we know the pointer can be null). From a tooling perspective, this is a killer: the static analyzer absolutely cannot warn that one has forgotten to check for null for every unannotated pointer, because the false-positive rate would be astronomical.

Finally, we’ve recently started considering violations of the attribute((nonnull)) contract to be undefined behavior, which fails the last of our goals. This is something we could debate further if it were the only problem, but these declaration attributes fall all of our criteria, so it’s not worth discussing.

Nullability Qualifiers
We propose the addition of a new set of type qualifiers, spelled __nullable, __nonnull, and __null_unspecified, to Clang. These are collectively known as nullability qualifiers and may be written anywhere any other type qualifier may be written (such as const) on any type subject to the following restrictions:

Two nullability qualifiers shall not appear in the same set of qualifiers.
A nullability qualifier shall qualify any pointer type, including pointers to objects, pointers to functions, C++ pointers to members, block pointers, and Objective-C object pointers.
A nullability qualifier in the declaration-specifiers applies to the innermost pointer type of each declarator (e.g., __nonnull int * is equivalent to int * __nonnull).
A nullability qualifier applied to a typedef of a nullability-qualified pointer type shall specify the same nullability as the underlying type of the typedef.

The meanings of the three nullability qualifiers are as follows:

__nullable: the pointer may store a null value at runtime (as part of the API contract)
__nonnull: the pointer should not store a null value at runtime (as part of the API contract). it is possible that the value can be null, e.g., in erroneous historic uses of an API, and it is up to the library implementor to decide to what degree she will accommodate such clients.
__null_unspecified: it is unclear whether the pointer can be null or not. Use of this type qualifier is extremely rare in practice, but it fills a small but important niche when auditing a particular header to add nullability qualifiers: sometimes the nullability contract for a few APIs in the header is unclear even when looking at the implementation for historical reasons, and establishing the contract requires more extensive study. In such cases, it’s often best to mark that pointer as __null_unspecified (which will help silence the warning about unannotated pointers in a header) and move on, coming back to __null_unspecified pointers when the appropriate graybeard has been summoned out of retirement [5].

Assumes-nonnull Regions
We’ve found that it’s fairly common for the majority of pointers within a particular header to be __nonnull. Therefore, we’ve introduced assumes-nonnull regions that assume that certain unannotated pointers implicitly get the __nonnull nullability qualifiers. Assumes-nonnull regions are marked by pragmas:

#pragma clang assume_nonnull begin
__nullable char *strchr(const char *s, int c); // s is inferred to be __nonnull
void *my_realloc (__nullable void *ptr, size_t size); // my_realloc is inferred to return __nonnull
#pragma clang assume_nonnull end

We infer __nonnull within an assumes_nonnull region when:

The pointer is a non-typedef declaration, such as a function parameter, variable, or data member, or the result type of a function. It’s very rare for one to warn typedefs to specify nullability information; rather, it’s usually the user of the typedef that needs to specify nullability.
The pointer is a single-level pointer, e.g., int* but not int**, because we’ve found that programmers can get confused about the nullability of multi-level pointers (is it a __nullable pointer to __nonnull pointers, or the other way around?) and inferring nullability for any of the pointers in a multi-level pointer compounds the situation.

Note that no #include may occur within an assumes_nonnull region, and assumes_nonnull regions cannot cross header boundaries.

Type System Impact
Nullability qualifiers are mapped to type attributes within the Clang type system, but a nullability-qualified pointer type is not semantically distinct from its unqualified pointer type. Therefore, one may freely convert between nullability-qualified and non-nullability-qualified pointers, or between nullability-qualified pointers with different nullability qualifiers. One cannot overload on nullability qualifiers, write C++ class template partial specializations that identify nullability qualifiers, or inspect nullability via type traits in any way.

Said more strongly, removing nullability qualifiers from a well-formed program will not change its behavior in any way, nor will the semantics of a program change when any set of (well-formed) nullability qualifiers are added to it. Operationally, this means that nullability qualifiers are not part of the canonical type in Clang’s type system, and that any warnings we produce based on nullability information will necessarily be dependent on Clang’s ability to retain type sugar during semantic analysis.

While it’s somewhat exceptional for us to introduce new type qualifiers that don’t produce semantically distinct types, we feel that this is the only plausible design and implementation strategy for this feature: pushing nullability qualifiers into the type system semantically would cause significant changes to the language (e.g., overloading, partial specialization) and break ABI (due to name mangling) that would drastically reduce the number of potential users, and we feel that Clang’s support for maintaining type sugar throughout semantic analysis is generally good enough [6] to get the benefits of nullability annotations in our tools.

Looking forward to our discussion.

Doug (with Jordan Rose and Anna Zaks)

[1] http://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions
[2] The standard description of strchr seems to imply that the parameter cannot be null
[3] The patch is complete, but should be reviewed on cfe-commits rather than here. There are also several logic parts to this monolithic patch:
(a) __nonnull/__nullable/__null_unspecified type specifiers

(b) nonnull/nullable/null_unspecified syntactic sugar for Objective-C
(c) Warning about inconsistent application of nullability specifiers within a given header
(d) assume_nonnnull begin/end pragmas
(e) Objective-C null_resettable property attribute

[4] https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html (search for “nonnull”)
[5] No graybeards were harmed in the making of this feature.
[6] Template instantiation is the notable exception here, because it always canonicalizes types.

nullability.patch (185 KB)

zygoloid · March 2, 2015, 11:34pm

Hello all,

Null pointers are a significant source of problems in applications.
Whether it’s SIGSEGV taking down a process or a foolhardy attempt to
recover from NullPointerException breaking invariants everywhere, it’s a
problem that’s bad enough for Tony Hoare to call the invention of the null
reference his billion dollar mistake [1]. It’s not the ability to create a
null pointer that is a problem—having a common sentinel value meaning “no
value” is extremely useful—but that it’s very hard to determine whether,
for a particular pointer, one is expected to be able to use null. C doesn’t
distinguish between “nullable” and “nonnull” pointers, so we turn to
documentation and experimentation. Consider strchr from the C standard
library:

char *strchr(const char *s, int c);

It is “obvious” to a programmer who knows the semantics of strchr that
it’s important to check for a returned null, because null is used as the
sentinel for “not found”. Of course, your tools don’t know that, so they
cannot help when you completely forget to check for the null case. Bugs
ensue.

Can I pass a null string to strchr? The standard is unclear [2], and my
platform’s implementation happily accepts a null parameter and returns
null, so obviously I shouldn’t worry about it… until I port my code, or the
underlying implementation changes because my expectations and the library
implementor’s expectations differ. Given the age of strchr, I suspect that
every implementation out there has an explicit, defensive check for a null
string, because it’s easier to add yet more defensive (and generally
useless) null checks than it is to ask your clients to fix their code.
Scale this up, and code bloat ensues, as well as wasted programmer effort
that obscures the places where checking for null really does matter.

In a recent version of Xcode, Apple introduced an extension to
C/C++/Objective-C that expresses the nullability of pointers in the type
system via new nullability qualifiers . Nullability qualifiers express
nullability as part of the declaration of strchr [2]:

__nullable char *strchr(__nonnull const char *s, int c);

With this, programmers and tools alike can better reason about the use of
strchr with null pointers.

We’d like to contribute the implementation (and there is a patch attached
at the end [3]), but since this is a nontrivial extension to all of the C
family of languages that Clang supports, we believe that it needs to be
discussed here first.

*Goals*
We have several specific goals that informed the design of this feature.

   - *Allow the intended nullability to be expressed on all pointers*:
   Pointers are used throughout library interfaces, and the nullability of
   those pointers is an important part of the API contract with users. It’s
   too simplistic to only allow function parameters to have nullability, for
   example, because it’s also important information for data members,
   pointers-to-pointers (e.g., "a nonnull pointer to a nullable pointer to an
   integer”), arrays of pointers, etc.
   - *Enable better tools support for detecting nullability problems:* The
   nullability annotations should be useful for tools (especially the static
   analyzer) that can reason about the use of null, to give warnings about
   both missed null checks (the result of strchr could be null…) as well as
   for unnecessarily-defensive code.
   - *Support workflows where all interfaces provide nullability
   annotations:* In moving from a world where there are no nullability
   annotations to one where we hope to see many such annotations, we’ve found
   it helpful to move header-by-header, auditing a complete header to give it
   nullability qualifiers. Once one has done that, additions to the header
   need to be held to the same standard, so we need a design that allows us to
   warn about pointers that don’t provide nullability annotations for some
   declarations in a header that already has some nullability annotations.

   - *Zero effect on ABI or code generation:* There are a huge number of
   interfaces that could benefit from the use of nullability qualifiers, but
   we won’t get widespread adoption if introducing the nullability qualifiers
   means breaking existing code, either in the ABI (say, because nullability
   qualifiers are mangled into the type) or at execution time (e.g., because a
   non-null pointer ends up being null along some error path and causes
   undefined behavior).

A sanitizer for this feature would seem very useful, but this bullet point

suggests that such a sanitizer would violate the model. Likewise, I don't
see why we should rule out the option of optimizing on the basis of these
qualifiers (under a -fstrict-nonnull flag or similar).

*Why not __attribute__((nonnull))?*
Clang already has an attribute to express nullability, “nonnull”, which we
inherited from GCC [4]. The “nonnull” attribute can be placed on functions
to indicate which parameters cannot be null: one either specifies the
indices of the arguments that cannot be null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull (1, 2)));

or omits the list of indices to state that all pointer arguments cannot be
null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull));

More recently, “nonnull” has grown the ability to be applied to
parameters, and one can use the companion attribute returns_nonnull to
state that a function returns a non-null pointer:

  extern void *my_memcpy (__attribute__((nonnull)) void *dest, __attribute__((nonnull)) const void *src, size_t len) __attribute__((returns_nonnull));

There are a number of problems here. First, there are different attributes
to express the same idea at different places in the grammar, and the use of
the “nonnull” attribute *on the function* actually has an effect *on the
function parameters* can get very, very confusing. Quick, which pointers
are nullable vs. non-null in this example?

__attribute__((nonnull)) void *my_realloc (void *ptr, size_t size);

According to that declaration, ptr is nonnull and the function returns a
nullable pointer… but that’s the opposite of how it reads (and behaves, if
this is anything like a realloc that cannot fail). Moreover, because these
two attributes are *declaration* attributes, not type attributes, you
cannot express that nullability of the inner pointer in a multi-level
pointer or an array of pointers, which makes these attributes verbose,
confusing, and not sufficiently generally. These attributes fail the first
of our goals.

These attributes aren’t as useful as they could be for tools support (the
second and third goals), because they only express the nonnull case,
leaving no way to distinguish between the unannotated case (nobody has
documented the nullability of some parameter) and the nullable case (we
know the pointer can be null). From a tooling perspective, this is a
killer: the static analyzer absolutely cannot warn that one has forgotten
to check for null for every unannotated pointer, because the false-positive
rate would be astronomical.

Finally, we’ve recently started considering violations of the
__attribute__((nonnull)) contract to be undefined behavior, which fails the
last of our goals. This is something we could debate further if it were the
only problem, but these declaration attributes fall all of our criteria, so
it’s not worth discussing.

*Nullability Qualifiers*
We propose the addition of a new set of type qualifiers, spelled
__nullable, __nonnull, and __null_unspecified, to Clang. These are
collectively known as *nullability qualifiers* and may be written
anywhere any other type qualifier may be written (such as const) on any
type subject to the following restrictions:

   - Two nullability qualifiers shall not appear in the same set of
   qualifiers.
   - A nullability qualifier shall qualify any pointer type, including
   pointers to objects, pointers to functions, C++ pointers to members, block
   pointers, and Objective-C object pointers.
   - A nullability qualifier in the declaration-specifiers applies to the
   innermost pointer type of each declarator (e.g., __nonnull int * is
   equivalent to int * __nonnull).

What happens if there's a mixture of different kinds of declarator? (Can I

have '__nonnull int (*p)[3]'? Can I have '__nonnull int *p[3];'?)

I think you're saying that this decision is made based on the syntax of the
declarator and not based on the underlying type, right? (So in

__nonnull T *

the __nonnull appertains to the *, even if T names a pointer type.) Given
that...

   - A nullability qualifier applied to a typedef of a
   nullability-qualified pointer type shall specify the same nullability as
   the underlying type of the typedef.

... I don't really see what this rule is for. I would expect "__nonnull T"

to be ill-formed because the innermost component of the declarator is not a
pointer, irrespective of whether T is a pointer type and whether it's
nullable. And I'd expect "__nonnull T *" to be valid whether or not T is a
typedef for a __nonnull pointer.

On the whole, I find it a little strange to allow a nullability qualifier
in the decl-specifier-seq / specifiers-and-qualifiers that applies to some
later pointer declarator; I would have expected this to be permitted in the
cv-qualifier-seq / type-qualifier-list after the pointer operator, and
nowhere else (or perhaps permitted in a decl-specifier-seq that also
contains a type-specifier for a pointer type). This kind of flexibility has
proven a disaster for the comprehensibility of GCC's type attributes.

The meanings of the three nullability qualifiers are as follows:

__nullable: the pointer may store a null value at runtime (as part of the
API contract)
__nonnull: the pointer should not store a null value at runtime (as part
of the API contract). it is possible that the value can be null, e.g., in
erroneous historic uses of an API, and it is up to the library implementor
to decide to what degree she will accommodate such clients.
__null_unspecified: it is unclear whether the pointer can be null or not.
Use of this type qualifier is extremely rare in practice, but it fills a
small but important niche when auditing a particular header to add
nullability qualifiers: sometimes the nullability contract for a few APIs
in the header is unclear *even when looking at the implementation* for
historical reasons, and establishing the contract requires more extensive
study. In such cases, it’s often best to mark that pointer as
__null_unspecified (which will help silence the warning about unannotated
pointers in a header) and move on, coming back to __null_unspecified
pointers when the appropriate graybeard has been summoned out of retirement
[5].

Have you considered adding C++11 attributes as synonyms for these?

*Assumes-nonnull Regions*

We’ve found that it's fairly common for the majority of pointers within a
particular header to be __nonnull. Therefore, we’ve introduced
assumes-nonnull regions that assume that certain unannotated pointers
implicitly get the __nonnull nullability qualifiers. Assumes-nonnull
regions are marked by pragmas:

#pragma clang assume_nonnull begin
__nullable char *strchr(const char *s, int c); // s is inferred to
be __nonnull
void *my_realloc (__nullable void *ptr, size_t size); // my_realloc is
inferred to return __nonnull
#pragma clang assume_nonnull end

These pragmas seem easy to miss when moving declarations around while
refactoring. Do you have enough experience with the feature to know if
that's an issue in practice?

We infer __nonnull within an assumes_nonnull region when:

   - The pointer is a non-typedef declaration, such as a function
   parameter, variable, or data member, or the result type of a function. It’s
   very rare for one to warn typedefs to specify nullability information;
   rather, it’s usually the user of the typedef that needs to specify
   nullability.

How can they do this, given the earlier rules?

   - The pointer is a single-level pointer, e.g., int* but not int**,
   because we’ve found that programmers can get confused about the nullability
   of multi-level pointers (is it a __nullable pointer to __nonnull pointers,
   or the other way around?) and inferring nullability for any of the pointers
   in a multi-level pointer compounds the situation.

Note that no #include may occur within an assumes_nonnull region, and
assumes_nonnull regions cannot cross header boundaries.

That sounds like it would make the lives of library maintainers using this
feature painful -- they would need to textually duplicate these pragmas and
the surrounding #ifdefs in every system header that needs them, rather than
factoring them out into #includable begin/end files. But I suppose we can
encourage the use of a macro expanding to _Pragma for those cases.

*Type System Impact*
Nullability qualifiers are mapped to type attributes within the Clang type
system, but a nullability-qualified pointer type is not semantically
distinct from its unqualified pointer type. Therefore, one may freely
convert between nullability-qualified and non-nullability-qualified
pointers, or between nullability-qualified pointers with different
nullability qualifiers. One cannot overload on nullability qualifiers,
write C++ class template partial specializations that identify nullability
qualifiers, or inspect nullability via type traits in any way.

Said more strongly, removing nullability qualifiers from a well-formed
program will not change its behavior in any way, nor will the semantics of
a program change when any set of (well-formed) nullability qualifiers are
added to it. Operationally, this means that *nullability qualifiers are
not part of the canonical type* in Clang’s type system, and that any
warnings we produce based on nullability information will necessarily be
dependent on Clang’s ability to retain type sugar during semantic analysis.

While it’s somewhat exceptional for us to introduce new type qualifiers
that don’t produce semantically distinct types, we feel that this is the
only plausible design and implementation strategy for this feature: pushing
nullability qualifiers into the type system semantically would cause
significant changes to the language (e.g., overloading, partial
specialization) and break ABI (due to name mangling) that would drastically
reduce the number of potential users, and we feel that Clang’s support for
maintaining type sugar throughout semantic analysis is generally good
enough [6] to get the benefits of nullability annotations in our tools.

This seems reasonable to me, given the constraints. (I've had some offline
discussions with various people about template type sugar reconstruction,
which would help to diagnose issues here.)

Looking forward to our discussion.

Dean_Sutherland1 · March 2, 2015, 11:41pm

What a very good idea! Strong support for expressing and checking intent regarding potential nullability can clearly help us all right more reliable programs. I have a few questions, though.

Consider LoadHTMLString:BaseURL: from the UIWebView class. Providing a null pointer for the BaseURL is clearly legal. It even has a defined meaning, which is “don’t bother to enforce the same origin policy for this content.” So the BaseURL argument is nullable in terms of the analysis you propose above. So far, so good. Sadly, turning off the “same origin policy” is nearly always a serious security bug. Permitting a null pointer in this context is extremely likely to be wrong; it probably represents a bug 999 times out of 1000 uses. (I’d claim it was always wrong, but someone at some point decided it was a feature worth supporting and documenting. And I have seen a very few unit tests where passing a null pointer for the BaseURL was arguably OK). Do you intend to address this kind of “legal, but virtually certain to be wrong” usage, or is that out of scope?
Based on your discussion of type system impact, I’m sure you are all well aware that the compiler support for “maintaining type sugar through semantic analysis” needed to make this proposal work is almost enough to support a full “plug-able type system” similar to what’s available in recent versions of Java. Do you have any plans to address the missing pieces? Being able to support pluggable type information more broadly (while continuing to guarantee that adding such qualifiers/attributes/however-it’s-expressed would have zero impact on generated code) would enable all sorts of interesting analyses. A few examples that spring to mind include compile-time enforcement of physical unit compatibility, various protocol-level checks (type states, a la Bierhoff & Aldrich), and many more. When last I investigated this question (about a year ago), the limitation regarding templates (your [6]) was the only significant missing piece. Closing that gap would greatly reduce the difficulty (and project risk) of implementing such analyses.

Dean F. Sutherland

DougGregor · March 3, 2015, 12:20am

I agree that a sanitizer would be useful as a debugging aid. My primary concern here is that optimizing based on this information not be a part of any normal optimization flag (-O2, -Os, whatever), because it will hamper widespread adoption of this feature if adding the annotations to indicate the API contract suddenly starts breaking existing clients by, e.g., optimizing out existing, defensive null checks.

I’ve said it poorly. It is based on the type, so

__nonnull T *

applies __nonnull to T unless T is a known type that is not a pointer type. In the patch Type::canHaveNullability() computes the operation, and essentially we apply the __nonnull to T when:

T is of pointer, C++ member pointer, block pointer, or Objective-C object pointer
T is a dependent type that could instantiate to some kind of pointer type

It’s an attempt to make the simple cases read better, e.g., “__nonnull int *” reads a whole lot better than “int * __nonnull” for most

We haven’t seen a disaster, but we have seen some confusion with multi-level pointers, because we have seen people try

__nonnull int * __nullable *

and scratch their heads when we complain about duplicate specifiers.

That said, the developers that have been using this feature have become perhaps a little accustomed to writing type qualifiers in the decl-specifier-seq that actually apply to the pointer, because the Objective-C ARC qualifiers (__weak, __autoreleasing) do the same thing in a more limited manner (that doesn’t run into the multi-level pointer problem).

I think it’s totally reasonable to add [[nullable]], [[nonnull]], and [[null_unspecified]]. If we feel that this feature is working out as we hope, we can push for standardization.

I don’t feel like we have enough experience here to be certain that it’s not a trap for users. We’ve pushed a significant number of headers through the process of applying nullability annotations in a production environment, but the feature hasn’t been in production use long enough for us to see the kinds of mistakes that could get introduced through refactoring. Additionally, it’s possible that we have more safeguards in place than your average programmer would, so these mistakes might be happening and then getting fixed before we hear about them. We also try to reduce mistakes by convention: headers should have begin pragma at the beginning of the header (after the #includes, of course) and the end pragma at the end of the header, rather than chopping up sections of the header that are audited vs. non-audited.

I hope that I’ve clarified my poor representation of the rules here.

We’re advocating for the use of a pair of macros to expand to the begin/end pragmas.

Template type sugar reconstruction would be a great QoI improvement in general, and would let the nullability qualifiers flow though template instantiations in the expected manner.

Doug

DougGregor · March 3, 2015, 12:40am

What a very good idea! Strong support for expressing and checking intent regarding potential nullability can clearly help us all right more reliable programs. I have a few questions, though.

Consider LoadHTMLString:BaseURL: from the UIWebView class. Providing a null pointer for the BaseURL is clearly legal. It even has a defined meaning, which is “don’t bother to enforce the same origin policy for this content.” So the BaseURL argument is nullable in terms of the analysis you propose above. So far, so good. Sadly, turning off the “same origin policy” is nearly always a serious security bug. Permitting a null pointer in this context is extremely likely to be wrong; it probably represents a bug 999 times out of 1000 uses. (I’d claim it was always wrong, but someone at some point decided it was a feature worth supporting and documenting. And I have seen a very few unit tests where passing a null pointer for the BaseURL was arguably OK). Do you intend to address this kind of “legal, but virtually certain to be wrong” usage, or is that out of scope?

This kind of decision is really up to the owner of that API. If she believes that her API shouldn’t make it easy to introduce this form of security problem, she can mark it as “__nonnull” and put in some defensive code to either fail softly or emulate prior behavior. Clients that pass a null value will start to see warnings that there is an issue.

Based on your discussion of type system impact, I’m sure you are all well aware that the compiler support for “maintaining type sugar through semantic analysis” needed to make this proposal work is almost enough to support a full “plug-able type system” similar to what’s available in recent versions of Java. Do you have any plans to address the missing pieces? Being able to support pluggable type information more broadly (while continuing to guarantee that adding such qualifiers/attributes/however-it’s-expressed would have zero impact on generated code) would enable all sorts of interesting analyses. A few examples that spring to mind include compile-time enforcement of physical unit compatibility, various protocol-level checks (type states, a la Bierhoff & Aldrich), and many more. When last I investigated this question (about a year ago), the limitation regarding templates (your [6]) was the only significant missing piece. Closing that gap would greatly reduce the difficulty (and project risk) of implementing such analyses.

We’re almost there in Clang, although I doubt that I personally will have time any time soon to work on addressing the issue of type sugar flowing through template specializations. I don’t think it’s fundamentally that hard to do, and beyond that I expect that we’ll see a long tail of small issues where we’re not maintaining type sugar everywhere.

Doug

zygoloid · March 3, 2015, 1:11am

Hello all,

Null pointers are a significant source of problems in applications.
Whether it’s SIGSEGV taking down a process or a foolhardy attempt to
recover from NullPointerException breaking invariants everywhere, it’s a
problem that’s bad enough for Tony Hoare to call the invention of the null
reference his billion dollar mistake [1]. It’s not the ability to create a
null pointer that is a problem—having a common sentinel value meaning “no
value” is extremely useful—but that it’s very hard to determine whether,
for a particular pointer, one is expected to be able to use null. C doesn’t
distinguish between “nullable” and “nonnull” pointers, so we turn to
documentation and experimentation. Consider strchr from the C standard
library:

char *strchr(const char *s, int c);

It is “obvious” to a programmer who knows the semantics of strchr that
it’s important to check for a returned null, because null is used as the
sentinel for “not found”. Of course, your tools don’t know that, so they
cannot help when you completely forget to check for the null case. Bugs
ensue.

Can I pass a null string to strchr? The standard is unclear [2], and my
platform’s implementation happily accepts a null parameter and returns
null, so obviously I shouldn’t worry about it… until I port my code, or the
underlying implementation changes because my expectations and the library
implementor’s expectations differ. Given the age of strchr, I suspect that
every implementation out there has an explicit, defensive check for a null
string, because it’s easier to add yet more defensive (and generally
useless) null checks than it is to ask your clients to fix their code.
Scale this up, and code bloat ensues, as well as wasted programmer effort
that obscures the places where checking for null really does matter.

In a recent version of Xcode, Apple introduced an extension to
C/C++/Objective-C that expresses the nullability of pointers in the type
system via new nullability qualifiers . Nullability qualifiers express
nullability as part of the declaration of strchr [2]:

__nullable char *strchr(__nonnull const char *s, int c);

With this, programmers and tools alike can better reason about the use of
strchr with null pointers.

We’d like to contribute the implementation (and there is a patch attached
at the end [3]), but since this is a nontrivial extension to all of the C
family of languages that Clang supports, we believe that it needs to be
discussed here first.

*Goals*
We have several specific goals that informed the design of this feature.

   - *Allow the intended nullability to be expressed on all pointers*:
   Pointers are used throughout library interfaces, and the nullability of
   those pointers is an important part of the API contract with users. It’s
   too simplistic to only allow function parameters to have nullability, for
   example, because it’s also important information for data members,
   pointers-to-pointers (e.g., "a nonnull pointer to a nullable pointer to an
   integer”), arrays of pointers, etc.
   - *Enable better tools support for detecting nullability problems:* The
   nullability annotations should be useful for tools (especially the static
   analyzer) that can reason about the use of null, to give warnings about
   both missed null checks (the result of strchr could be null…) as well as
   for unnecessarily-defensive code.
   - *Support workflows where all interfaces provide nullability
   annotations:* In moving from a world where there are no nullability
   annotations to one where we hope to see many such annotations, we’ve found
   it helpful to move header-by-header, auditing a complete header to give it
   nullability qualifiers. Once one has done that, additions to the header
   need to be held to the same standard, so we need a design that allows us to
   warn about pointers that don’t provide nullability annotations for some
   declarations in a header that already has some nullability annotations.

   - *Zero effect on ABI or code generation:* There are a huge number of
   interfaces that could benefit from the use of nullability qualifiers, but
   we won’t get widespread adoption if introducing the nullability qualifiers
   means breaking existing code, either in the ABI (say, because nullability
   qualifiers are mangled into the type) or at execution time (e.g., because a
   non-null pointer ends up being null along some error path and causes
   undefined behavior).

A sanitizer for this feature would seem very useful, but this bullet

point suggests that such a sanitizer would violate the model. Likewise, I
don't see why we should rule out the option of optimizing on the basis of
these qualifiers (under a -fstrict-nonnull flag or similar).

I agree that a sanitizer would be useful as a debugging aid. My primary
concern here is that optimizing based on this information *not* be a part
of any normal optimization flag (-O2, -Os, whatever), because it will
hamper widespread adoption of this feature if adding the annotations to
indicate the API contract suddenly starts breaking existing clients by,
e.g., optimizing out existing, defensive null checks.

*Why not __attribute__((nonnull))?*
Clang already has an attribute to express nullability, “nonnull”, which
we inherited from GCC [4]. The “nonnull” attribute can be placed on
functions to indicate which parameters cannot be null: one either specifies
the indices of the arguments that cannot be null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull (1, 2)));

or omits the list of indices to state that all pointer arguments cannot
be null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull));

More recently, “nonnull” has grown the ability to be applied to
parameters, and one can use the companion attribute returns_nonnull to
state that a function returns a non-null pointer:

  extern void *my_memcpy (__attribute__((nonnull)) void *dest, __attribute__((nonnull)) const void *src, size_t len) __attribute__((returns_nonnull));

There are a number of problems here. First, there are different
attributes to express the same idea at different places in the grammar, and
the use of the “nonnull” attribute *on the function* actually has an
effect *on the function parameters* can get very, very confusing. Quick,
which pointers are nullable vs. non-null in this example?

__attribute__((nonnull)) void *my_realloc (void *ptr, size_t size);

According to that declaration, ptr is nonnull and the function returns a
nullable pointer… but that’s the opposite of how it reads (and behaves, if
this is anything like a realloc that cannot fail). Moreover, because these
two attributes are *declaration* attributes, not type attributes, you
cannot express that nullability of the inner pointer in a multi-level
pointer or an array of pointers, which makes these attributes verbose,
confusing, and not sufficiently generally. These attributes fail the first
of our goals.

These attributes aren’t as useful as they could be for tools support (the
second and third goals), because they only express the nonnull case,
leaving no way to distinguish between the unannotated case (nobody has
documented the nullability of some parameter) and the nullable case (we
know the pointer can be null). From a tooling perspective, this is a
killer: the static analyzer absolutely cannot warn that one has forgotten
to check for null for every unannotated pointer, because the false-positive
rate would be astronomical.

Finally, we’ve recently started considering violations of the
__attribute__((nonnull)) contract to be undefined behavior, which fails the
last of our goals. This is something we could debate further if it were the
only problem, but these declaration attributes fall all of our criteria, so
it’s not worth discussing.

*Nullability Qualifiers*
We propose the addition of a new set of type qualifiers, spelled
__nullable, __nonnull, and __null_unspecified, to Clang. These are
collectively known as *nullability qualifiers* and may be written
anywhere any other type qualifier may be written (such as const) on any
type subject to the following restrictions:

   - Two nullability qualifiers shall not appear in the same set of
   qualifiers.
   - A nullability qualifier shall qualify any pointer type, including
   pointers to objects, pointers to functions, C++ pointers to members, block
   pointers, and Objective-C object pointers.
   - A nullability qualifier in the declaration-specifiers applies to
   the innermost pointer type of each declarator (e.g., __nonnull int * is
   equivalent to int * __nonnull).

What happens if there's a mixture of different kinds of declarator? (Can

I have '__nonnull int (*p)[3]'? Can I have '__nonnull int *p[3];'?)

I think you're saying that this decision is made based on the syntax of
the declarator and not based on the underlying type, right? (So in

  __nonnull T *

the __nonnull appertains to the *, even if T names a pointer type.)

I’ve said it poorly. It is based on the type, so

__nonnull T *

applies __nonnull to T unless T is a known type that is not a pointer
type. In the patch Type::canHaveNullability() computes the operation, and
essentially we apply the __nonnull to T when:
- T is of pointer, C++ member pointer, block pointer, or Objective-C
object pointer
- T is a dependent type that could instantiate to some kind of pointer type

That seems like it could be very confusing:

void f(__nonnull int *p);

template<typename Integral>
void f(__nonnull Integral *p);

... would apply the __nonnull to different components of the type (and I
don't even want to think about what happens when T is a member of the
current instantiation). The outcome seems to be that people writing
templates need to know about both ways of writing this, and they need to
know the gotchas and the minutiae of the rules, and they need to be able to
reason with precision about which types are dependent.

This is a problem in C too. Consider:

#include <some_library.h> // vends an opaque_t typedef
void use_library(__nonnull opaque_t *handle);

We cannot know what this program means without depending on an
implementation detail of some_library.h. And this is not an exotic problem;
consider, for instance, some_library == stdio and opaque_t == FILE.

Given that...

   - A nullability qualifier applied to a typedef of a
   nullability-qualified pointer type shall specify the same nullability as
   the underlying type of the typedef.

... I don't really see what this rule is for. I would expect "__nonnull

T" to be ill-formed because the innermost component of the declarator is
not a pointer, irrespective of whether T is a pointer type and whether it's
nullable. And I'd expect "__nonnull T *" to be valid whether or not T is a
typedef for a __nonnull pointer.

On the whole, I find it a little strange to allow a nullability qualifier
in the decl-specifier-seq / specifiers-and-qualifiers that applies to some
later pointer declarator; I would have expected this to be permitted in the
cv-qualifier-seq / type-qualifier-list after the pointer operator, and
nowhere else (or perhaps permitted in a decl-specifier-seq that also
contains a type-specifier for a pointer type).

It’s an attempt to make the simple cases read better, e.g., “__nonnull int
*” reads a whole lot better than “int * __nonnull” for most

This kind of flexibility has proven a disaster for the comprehensibility
of GCC's type attributes.

We haven’t seen a disaster, but we have seen some confusion with
multi-level pointers, because we have seen people try

__nonnull int * __nullable *

and scratch their heads when we complain about duplicate specifiers.

That said, the developers that have been using this feature have become
perhaps a little accustomed to writing type qualifiers in the
decl-specifier-seq that actually apply to the pointer, because the
Objective-C ARC qualifiers (__weak, __autoreleasing) do the same thing in a
more limited manner (that doesn’t run into the multi-level pointer
problem).

Based on your experience, how painful would it be to remove the
'nullability qualifier on decl-specifier-seq gets implicitly rewritten to
be somewhere else sometimes' rule?

The meanings of the three nullability qualifiers are as follows:

__nullable: the pointer may store a null value at runtime (as part of
the API contract)
__nonnull: the pointer should not store a null value at runtime (as part
of the API contract). it is possible that the value can be null, e.g., in
erroneous historic uses of an API, and it is up to the library implementor
to decide to what degree she will accommodate such clients.
__null_unspecified: it is unclear whether the pointer can be null or
not. Use of this type qualifier is extremely rare in practice, but it fills
a small but important niche when auditing a particular header to add
nullability qualifiers: sometimes the nullability contract for a few APIs
in the header is unclear *even when looking at the implementation* for
historical reasons, and establishing the contract requires more extensive
study. In such cases, it’s often best to mark that pointer as
__null_unspecified (which will help silence the warning about unannotated
pointers in a header) and move on, coming back to __null_unspecified
pointers when the appropriate graybeard has been summoned out of retirement
[5].

Have you considered adding C++11 attributes as synonyms for these?

I think it’s totally reasonable to add [[nullable]], [[nonnull]], and
[[null_unspecified]]. If we feel that this feature is working out as we
hope, we can push for standardization.

These should be [[clang::blah]] until standardized, but otherwise that
works for me.

Finkel_Hal_J · March 4, 2015, 3:55am

From: "Richard Smith" <richard@metafoo.co.uk>
To: "Douglas Gregor" <dgregor@apple.com>
Cc: "cfe-dev Developers" <cfe-dev@cs.uiuc.edu>
Sent: Monday, March 2, 2015 5:34:18 PM
Subject: Re: [cfe-dev] RFC: Nullability qualifiers

> Hello all,

> Null pointers are a significant source of problems in applications.
> Whether it’s SIGSEGV taking down a process or a foolhardy attempt
> to
> recover from NullPointerException breaking invariants everywhere,
> it’s a problem that’s bad enough for Tony Hoare to call the
> invention of the null reference his billion dollar mistake [1].
> It’s
> not the ability to create a null pointer that is a problem—having a
> common sentinel value meaning “no value” is extremely useful—but
> that it’s very hard to determine whether, for a particular pointer,
> one is expected to be able to use null. C doesn’t distinguish
> between “nullable” and “nonnull” pointers, so we turn to
> documentation and experimentation. Consider strchr from the C
> standard library:

> char *strchr(const char *s, int c);

> It is “obvious” to a programmer who knows the semantics of strchr
> that it’s important to check for a returned null, because null is
> used as the sentinel for “not found”. Of course, your tools don’t
> know that, so they cannot help when you completely forget to check
> for the null case. Bugs ensue.

> Can I pass a null string to strchr? The standard is unclear [2],
> and
> my platform’s implementation happily accepts a null parameter and
> returns null, so obviously I shouldn’t worry about it… until I port
> my code, or the underlying implementation changes because my
> expectations and the library implementor’s expectations differ.
> Given the age of strchr, I suspect that every implementation out
> there has an explicit, defensive check for a null string, because
> it’s easier to add yet more defensive (and generally useless) null
> checks than it is to ask your clients to fix their code. Scale this
> up, and code bloat ensues, as well as wasted programmer effort that
> obscures the places where checking for null really does matter.

> In a recent version of Xcode, Apple introduced an extension to
> C/C++/Objective-C that expresses the nullability of pointers in the
> type system via new nullability qualifiers . Nullability qualifiers
> express nullability as part of the declaration of strchr [2]:

> __nullable char *strchr(__nonnull const char *s, int c);

> With this, programmers and tools alike can better reason about the
> use of strchr with null pointers.

> We’d like to contribute the implementation (and there is a patch
> attached at the end [3]), but since this is a nontrivial extension
> to all of the C family of languages that Clang supports, we believe
> that it needs to be discussed here first.

> Goals

> We have several specific goals that informed the design of this
> feature.

> * Allow the intended nullability to be expressed on all pointers :
> Pointers are used throughout library interfaces, and the
> nullability
> of those pointers is an important part of the API contract with
> users. It’s too simplistic to only allow function parameters to
> have
> nullability, for example, because it’s also important information
> for data members, pointers-to-pointers (e.g., "a nonnull pointer to
> a nullable pointer to an integer”), arrays of pointers, etc.

> * Enable better tools support for detecting nullability problems:
> The
> nullability annotations should be useful for tools (especially the
> static analyzer) that can reason about the use of null, to give
> warnings about both missed null checks (the result of strchr could
> be null…) as well as for unnecessarily-defensive code.

> * Support workflows where all interfaces provide nullability
> annotations: In moving from a world where there are no nullability
> annotations to one where we hope to see many such annotations,
> we’ve
> found it helpful to move header-by-header, auditing a complete
> header to give it nullability qualifiers. Once one has done that,
> additions to the header need to be held to the same standard, so we
> need a design that allows us to warn about pointers that don’t
> provide nullability annotations for some declarations in a header
> that already has some nullability annotations.

> * Zero effect on ABI or code generation: There are a huge number of
> interfaces that could benefit from the use of nullability
> qualifiers, but we won’t get widespread adoption if introducing the
> nullability qualifiers means breaking existing code, either in the
> ABI (say, because nullability qualifiers are mangled into the type)
> or at execution time (e.g., because a non-null pointer ends up
> being
> null along some error path and causes undefined behavior).

A sanitizer for this feature would seem very useful, but this bullet
point suggests that such a sanitizer would violate the model.
Likewise, I don't see why we should rule out the option of
optimizing on the basis of these qualifiers (under a
-fstrict-nonnull flag or similar).

> Why not __attribute__((nonnull))?

> Clang already has an attribute to express nullability, “nonnull”,
> which we inherited from GCC [4]. The “nonnull” attribute can be
> placed on functions to indicate which parameters cannot be null:
> one
> either specifies the indices of the arguments that cannot be null,
> e.g.,

> extern void *my_memcpy (void *dest, const void *src, size_t len)
> __attribute__((nonnull (1, 2)));

> or omits the list of indices to state that all pointer arguments
> cannot be null, e.g.,

> extern void *my_memcpy (void *dest, const void *src, size_t len)
> __attribute__((nonnull));

> More recently, “nonnull” has grown the ability to be applied to
> parameters, and one can use the companion attribute returns_nonnull
> to state that a function returns a non-null pointer:

> extern void *my_memcpy (__attribute__((nonnull)) void *dest,
> __attribute__((nonnull)) const void *src, size_t len)
> __attribute__((returns_nonnull));

> There are a number of problems here. First, there are different
> attributes to express the same idea at different places in the
> grammar, and the use of the “nonnull” attribute on the function
> actually has an effect on the function parameters can get very,
> very
> confusing. Quick, which pointers are nullable vs. non-null in this
> example?

> __attribute__((nonnull)) void *my_realloc (void *ptr, size_t size);

> According to that declaration, ptr is nonnull and the function
> returns a nullable pointer… but that’s the opposite of how it reads
> (and behaves, if this is anything like a realloc that cannot fail).
> Moreover, because these two attributes are declaration attributes,
> not type attributes, you cannot express that nullability of the
> inner pointer in a multi-level pointer or an array of pointers,
> which makes these attributes verbose, confusing, and not
> sufficiently generally. These attributes fail the first of our
> goals.

> These attributes aren’t as useful as they could be for tools
> support
> (the second and third goals), because they only express the nonnull
> case, leaving no way to distinguish between the unannotated case
> (nobody has documented the nullability of some parameter) and the
> nullable case (we know the pointer can be null). From a tooling
> perspective, this is a killer: the static analyzer absolutely
> cannot
> warn that one has forgotten to check for null for every unannotated
> pointer, because the false-positive rate would be astronomical.

> Finally, we’ve recently started considering violations of the
> __attribute__((nonnull)) contract to be undefined behavior, which
> fails the last of our goals. This is something we could debate
> further if it were the only problem, but these declaration
> attributes fall all of our criteria, so it’s not worth discussing.

On this last point, how do you want to define the interaction between these? Should we not consider the violation to be undefined behavior if these new qualifiers are present?

-Hal

DougGregor · March 4, 2015, 7:51pm

From: “Richard Smith” <richard@metafoo.co.uk>
To: “Douglas Gregor” <dgregor@apple.com>
Cc: “cfe-dev Developers” <cfe-dev@cs.uiuc.edu>
Sent: Monday, March 2, 2015 5:34:18 PM
Subject: Re: [cfe-dev] RFC: Nullability qualifiers

On this last point, how do you want to define the interaction between these? Should we not consider the violation to be undefined behavior if these new qualifiers are present?

I think we should continue to say that the existing nonnull attributes require nonnull values (UB if a null gets in there). The type qualifiers will not.

Moreover, we should have the existing nonnull/returns_nonnull attributes imply __nonnull, since we’ll get more utility out of existing annotations with way.

Doug

DougGregor · March 4, 2015, 8:02pm

Yes, that’s a good point. The warning about missing nullability annotations in headers, as well as the fact that this feature has only been deployed on one platform, probably accounts for the lack of reports of actual confusion caused by this.

It will be painful, but we think we can do it. We’ll investigate further.

_sean_silva · March 5, 2015, 3:23am

Hello all,

Null pointers are a significant source of problems in applications.
Whether it’s SIGSEGV taking down a process or a foolhardy attempt to
recover from NullPointerException breaking invariants everywhere, it’s a
problem that’s bad enough for Tony Hoare to call the invention of the null
reference his billion dollar mistake [1]. It’s not the ability to create a
null pointer that is a problem—having a common sentinel value meaning “no
value” is extremely useful—but that it’s very hard to determine whether,
for a particular pointer, one is expected to be able to use null. C doesn’t
distinguish between “nullable” and “nonnull” pointers, so we turn to
documentation and experimentation. Consider strchr from the C standard
library:

char *strchr(const char *s, int c);

It is “obvious” to a programmer who knows the semantics of strchr that
it’s important to check for a returned null, because null is used as the
sentinel for “not found”. Of course, your tools don’t know that, so they
cannot help when you completely forget to check for the null case. Bugs
ensue.

Can I pass a null string to strchr? The standard is unclear [2], and my
platform’s implementation happily accepts a null parameter and returns
null, so obviously I shouldn’t worry about it… until I port my code, or the
underlying implementation changes because my expectations and the library
implementor’s expectations differ. Given the age of strchr, I suspect that
every implementation out there has an explicit, defensive check for a null
string, because it’s easier to add yet more defensive (and generally
useless) null checks than it is to ask your clients to fix their code.
Scale this up, and code bloat ensues, as well as wasted programmer effort
that obscures the places where checking for null really does matter.

In a recent version of Xcode, Apple introduced an extension to
C/C++/Objective-C that expresses the nullability of pointers in the type
system via new nullability qualifiers . Nullability qualifiers express
nullability as part of the declaration of strchr [2]:

__nullable char *strchr(__nonnull const char *s, int c);

With this, programmers and tools alike can better reason about the use
of strchr with null pointers.

We’d like to contribute the implementation (and there is a patch
attached at the end [3]), but since this is a nontrivial extension to all
of the C family of languages that Clang supports, we believe that it needs
to be discussed here first.

*Goals*
We have several specific goals that informed the design of this feature.

   - *Allow the intended nullability to be expressed on all pointers*:
   Pointers are used throughout library interfaces, and the nullability of
   those pointers is an important part of the API contract with users. It’s
   too simplistic to only allow function parameters to have nullability, for
   example, because it’s also important information for data members,
   pointers-to-pointers (e.g., "a nonnull pointer to a nullable pointer to an
   integer”), arrays of pointers, etc.
   - *Enable better tools support for detecting nullability problems:* The
   nullability annotations should be useful for tools (especially the static
   analyzer) that can reason about the use of null, to give warnings about
   both missed null checks (the result of strchr could be null…) as well as
   for unnecessarily-defensive code.
   - *Support workflows where all interfaces provide nullability
   annotations:* In moving from a world where there are no nullability
   annotations to one where we hope to see many such annotations, we’ve found
   it helpful to move header-by-header, auditing a complete header to give it
   nullability qualifiers. Once one has done that, additions to the header
   need to be held to the same standard, so we need a design that allows us to
   warn about pointers that don’t provide nullability annotations for some
   declarations in a header that already has some nullability annotations.

   - *Zero effect on ABI or code generation:* There are a huge number
   of interfaces that could benefit from the use of nullability qualifiers,
   but we won’t get widespread adoption if introducing the nullability
   qualifiers means breaking existing code, either in the ABI (say, because
   nullability qualifiers are mangled into the type) or at execution time
   (e.g., because a non-null pointer ends up being null along some error path
   and causes undefined behavior).

A sanitizer for this feature would seem very useful, but this bullet

point suggests that such a sanitizer would violate the model. Likewise, I
don't see why we should rule out the option of optimizing on the basis of
these qualifiers (under a -fstrict-nonnull flag or similar).

I agree that a sanitizer would be useful as a debugging aid. My primary
concern here is that optimizing based on this information *not* be a
part of any normal optimization flag (-O2, -Os, whatever), because it will
hamper widespread adoption of this feature if adding the annotations to
indicate the API contract suddenly starts breaking existing clients by,
e.g., optimizing out existing, defensive null checks.

*Why not __attribute__((nonnull))?*
Clang already has an attribute to express nullability, “nonnull”, which
we inherited from GCC [4]. The “nonnull” attribute can be placed on
functions to indicate which parameters cannot be null: one either specifies
the indices of the arguments that cannot be null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull (1, 2)));

or omits the list of indices to state that all pointer arguments cannot
be null, e.g.,

  extern void *my_memcpy (void *dest, const void *src, size_t len) __attribute__((nonnull));

More recently, “nonnull” has grown the ability to be applied to
parameters, and one can use the companion attribute returns_nonnull to
state that a function returns a non-null pointer:

  extern void *my_memcpy (__attribute__((nonnull)) void *dest, __attribute__((nonnull)) const void *src, size_t len) __attribute__((returns_nonnull));

There are a number of problems here. First, there are different
attributes to express the same idea at different places in the grammar, and
the use of the “nonnull” attribute *on the function* actually has an
effect *on the function parameters* can get very, very confusing.
Quick, which pointers are nullable vs. non-null in this example?

__attribute__((nonnull)) void *my_realloc (void *ptr, size_t size);

According to that declaration, ptr is nonnull and the function returns a
nullable pointer… but that’s the opposite of how it reads (and behaves, if
this is anything like a realloc that cannot fail). Moreover, because these
two attributes are *declaration* attributes, not type attributes, you
cannot express that nullability of the inner pointer in a multi-level
pointer or an array of pointers, which makes these attributes verbose,
confusing, and not sufficiently generally. These attributes fail the first
of our goals.

These attributes aren’t as useful as they could be for tools support
(the second and third goals), because they only express the nonnull case,
leaving no way to distinguish between the unannotated case (nobody has
documented the nullability of some parameter) and the nullable case (we
know the pointer can be null). From a tooling perspective, this is a
killer: the static analyzer absolutely cannot warn that one has forgotten
to check for null for every unannotated pointer, because the false-positive
rate would be astronomical.

Finally, we’ve recently started considering violations of the
__attribute__((nonnull)) contract to be undefined behavior, which fails the
last of our goals. This is something we could debate further if it were the
only problem, but these declaration attributes fall all of our criteria, so
it’s not worth discussing.

*Nullability Qualifiers*
We propose the addition of a new set of type qualifiers, spelled
__nullable, __nonnull, and __null_unspecified, to Clang. These are
collectively known as *nullability qualifiers* and may be written
anywhere any other type qualifier may be written (such as const) on any
type subject to the following restrictions:

   - Two nullability qualifiers shall not appear in the same set of
   qualifiers.
   - A nullability qualifier shall qualify any pointer type, including
   pointers to objects, pointers to functions, C++ pointers to members, block
   pointers, and Objective-C object pointers.
   - A nullability qualifier in the declaration-specifiers applies to
   the innermost pointer type of each declarator (e.g., __nonnull int * is
   equivalent to int * __nonnull).

What happens if there's a mixture of different kinds of declarator? (Can

I have '__nonnull int (*p)[3]'? Can I have '__nonnull int *p[3];'?)

I think you're saying that this decision is made based on the syntax of
the declarator and not based on the underlying type, right? (So in

  __nonnull T *

the __nonnull appertains to the *, even if T names a pointer type.)

I’ve said it poorly. It is based on the type, so

__nonnull T *

applies __nonnull to T unless T is a known type that is not a pointer
type. In the patch Type::canHaveNullability() computes the operation, and
essentially we apply the __nonnull to T when:
- T is of pointer, C++ member pointer, block pointer, or Objective-C
object pointer
- T is a dependent type that could instantiate to some kind of pointer
type

That seems like it could be very confusing:

  void f(__nonnull int *p);

  template<typename Integral>
  void f(__nonnull Integral *p);

... would apply the __nonnull to different components of the type (and I
don't even want to think about what happens when T is a member of the
current instantiation). The outcome seems to be that people writing
templates need to know about both ways of writing this, and they need to
know the gotchas and the minutiae of the rules, and they need to be able to
reason with precision about which types are dependent.

This is a problem in C too. Consider:

  #include <some_library.h> // vends an opaque_t typedef
  void use_library(__nonnull opaque_t *handle);

We cannot know what this program means without depending on an
implementation detail of some_library.h. And this is not an exotic problem;
consider, for instance, some_library == stdio and opaque_t == FILE.

Random idea: could we bootstrap on programmer's knowledge of of const and
volatile and have the rule be "__nonnull <something>" applies to the
pointer that points to the const thing in "const <something>"? Since cv are
eligible for template specialization and nullability-qualifiers aren't,
maybe that just trades-off for another inconsistency?

-- Sean Silva

DougGregor · March 5, 2015, 4:09am

Personally, I feel like we shouldn’t conflate constness with nullability, because they’re really orthogonal notions. Plus, Richard has convinced me that we shouldn’t move the nullability qualifiers away from the decl-specifiers.

Doug

Marshall_Clow1 · March 5, 2015, 2:51pm

I think that these would be great.
(along with a sanitizer)

— Marshall

Marshall_Clow1 · March 5, 2015, 3:05pm

I’m pretty sure that you know this, Doug, but:

gcc already does (some of) this kind of stuff, and they absolutely use it in their codegen decisions.

consider the following (simplified, fanciful) code:

char global[1000];

char *foo (char *p, size_t sz)
{
// stash a copy away
memcpy(global, p, sz);

if (p == NULL)
p = (char *) malloc(100);
return p;
}

char *p2 = NULL;
char *p3 = foo(p2, 0);

After executing this code, what will the value of ‘p3’ be?
Under gcc/glibc, it will be NULL, because memcpy is marked with “non-null” attributes, and so the “if (p==NULL)” branch is removed.
(Yes, even though memcpy(x, y, 0) will not dereference the pointer)

— Marshall

DougGregor · March 9, 2015, 6:08am

So, we’re going to remove the ability for a nullability qualifier written in the decl-specifiers to move to a pointer/block/member pointer declarator. To aid our own transition, I’ll commit it as a warning (in its own, unique warning group) set to DefaultError that has a proper Fix-It, and once the transition is complete, we’ll change it to an error.

Thanks for all the helpful feedback!

Doug

DougGregor · June 24, 2015, 8:39pm

Another addendum: due to the conflict with glibc’s __nonnull, we’ll be renaming the __double_underscored keywords to _Big_underscored keywords, e.g.,

__nonnull → _Nonnull
__nullable → _Nullable
__null_unspecified → _Null_unspecified

On Darwin, we’ll add predefines

#define __nonnull _Nonnull
#define __nullable _Nullable
#define __null_unspecified _Null_unspecified

to keep the existing headers working.

Doug

Joerg_Sonnenberger1 · June 24, 2015, 9:25pm

Thanks for that.

Joerg

davidchisnall · June 25, 2015, 9:01am

Has anyone proposed these to WG14? They seem like they’d be good additions to the C standard and, if nothing else, it would be good to make sure that the next C standard doesn’t use the same spelling for something subtly different.

David

Kal · June 26, 2015, 9:36pm

How can one detect if an Apple clang supports the new nullability attributes. I tried something like:

#if __has_attribute(_Nonnull)
#elif __has_attribute(__nonnull)
#define _Nonnull __nonnull
#else
#define _Nonnull
#endif

But this didn’t work. Why doesn’t _Nonnull/__nonnull work with __has_attribute?

AaronBallman · June 26, 2015, 9:40pm

How can one detect if an Apple clang supports the new nullability
attributes. I tried something like:

#if __has_attribute(_Nonnull)
#elif __has_attribute(__nonnull)
#define _Nonnull __nonnull
#else
#define _Nonnull
#endif

But this didn't work. Why doesn't _Nonnull/__nonnull work with
__has_attribute?

__has_attribute is used to test for GNU-style attribute support only.
To test for nullability, you should use: __has_feature(nullability)

~Aaron

Kal · June 26, 2015, 9:44pm

OK. What would be the best way to detect if Apple clang supports _Nonnull or only __nonnull though.

Topic		Replies	Views
RFC: Nullability qualifiers Clang Frontend	3	71	March 9, 2015
RFC: Nullability qualifiers Clang Frontend	8	108	June 15, 2015
Fixing selector types on the GNU runtime Clang Frontend	22	105	January 21, 2009
Null pointer to standard functions Clang Frontend	2	117	March 2, 2015
null pointer literals, warnings, and fixups Clang Frontend	28	124	August 27, 2011

RFC: Nullability qualifiers

Related Topics