[RFC] _Optional: a type qualifier to indicate pointer nullability

Christopher_Bazley · February 1, 2023, 5:32pm

You’re aware free is defined to work on NULL pointers?..

Have you ever tried calling free with a pointer to const? With the exception of the address-of operator, _Optional behaves exactly like const. If it didn’t, I never would have proposed it.

jrtc27 · February 1, 2023, 5:35pm

So just cast it away if your standard library headers aren’t _Optional aware? Though I do wonder how viable code is going to be if your standard library isn’t sufficiently annotated. Adding the branch adds overhead that isn’t necessary.

Christopher_Bazley · February 1, 2023, 5:54pm

So I strongly suspect that this approach will need massive quality of life improvements which will ultimately make it much more in line with existing solutions, which were the root cause of your frustration in the first place.

That’s why I spent my Christmas holidays using the proposed feature on a large scale. I liked it before when it was only theoretical, and I still like it now.

It’s designed to be trivial to implement, and intuitive to use. If it isn’t nice to use, then fine, I wasted my time. What I do find frustrating is the number of people picking theoretical holes and proposing alternative syntax without ever having actually used the feature. It makes all the effort I put into producing a working prototype feel like a waste of time.

Aaron previously wrote:

having an implementation in hand to play with would go a long ways towards proving the concept is implementable and allowing us to see what the ergonomics of the feature are in practice.

However the availability of an (incredibly simple) implementation doesn’t seem to have made any difference in practice.

Christopher_Bazley · February 1, 2023, 6:08pm

Well, if it was a property of the pointed-to object, then it would also make sense to write the same with zero stars:
const int i; // i is an int that _is_ stored in read-only memory
volatile int j; // j is an int that _is_ stored in shared memory
_Optional int k; // k is an int for which no storage is allocated???

You misquoted my comments from the original article:

_Optional int *k; // *k is an int for which no storage may be allocated

Note that it says may be allocated, not is allocated. Originally, I intended to allow such declarations; ⚙ D142738 Warn if _Optional used at top-level of decl disallows them. I put every part of my proposal in a different commit to allow people to mix-and-match.

If your declaration _Optional int k were allowed, then the address of k would have type int *. I actually like the symmetry of that: it tickles me, it keeps the language simple, and makes it easier to discover the rule for & that is applied in every other expression.

It’s currently disallowed because Aaron previously wrote:

I’d still recommend exploring the design space of making _Optional a qualifier that is grammatically only allowed on a pointer. This removes several problems with the feature, such as people trying to write _Optional int i

and I was willing to sacrifice some of the orthogonality of my proposal to try to save the rest of it. I’m beginning to regret it though, because a lot of people seem to be jumping to the wrong conclusion that just because _Optional int i isn’t allowed, that means optional objects cannot exist and therefore the qualifier should be removed implicitly by every dereference operator instead of by &.

Given that this doesn’t make sense for your _Optional qualifier,

Arguably it does make sense – see above.

it sounds to me as if you just introduced a different way to spell a pointer qualifier (“let’s put it before the *, not after the *”) without any substantial differences to semantics.

Sorry but I think you’ve completely misunderstood my intent. The reason for the pretentious ‘Philosophical underpinning’ section at the top of my proposal was to explain things like why I didn’t propose a keyword with new semantics that are arbitrarily divorced from the language’s syntax.

_Optional appears before the * specifically in order that it doesn’t have to be handled differently from const or volatile. The syntax is chosen in order to give the desired semantics for every existing type of statement.

tahonermann · February 1, 2023, 6:43pm

Have you looked into using Clang’s existing nullability attributes? I know you said you use gcc, but it might be the case that gcc maintainers are open to supporting them.

Christopher_Bazley · February 1, 2023, 11:05pm

First of all, I don’t think this specific problem even needs an in-code solution at all. Instead, the analysis that causes the warning to appear near strcmp can be made smart enough to recognize that s1 and s2 are never null at this point.

That is true, but I one of the axioms at start of my proposal was that it should be ‘(relatively) easy to create a compiler for [C]’. I later postulated that improved null safety does not require path-sensitive analysis and mentioned some compilers which are not huge, complex and resource-hungry, but still perform a useful job of compiling C programs. I believe it also simplifies analysis if only (syntactic) dereferences need to be checked.

You need such smarts in the analysis anyway, to cover another very important case:
int foo(_Optional const char *s1, _Optional const char *s2)
{
  if (...) {
    return 0;
  }
  assert(s1);
  assert(s2);
  return strcmp(s1, s2);
}

I can only guess that the if condition you have elided is something which allows the programmer to assume that neither s1 nor s2 is null, despite not explicitly checking for that. From my point of view, the assertions are irrelevant, since I’m assuming they are not checked in release builds.

Honestly, I would be OK with forcing the programmer to check for null values of s1 and s2 in that scenario. It seems a bit silly to create an interface which explicitly allows those values to be null but does not handle the consequences (not even by casting away the qualifier).

Now, obviously, sometimes you really need a “force-unwrap” operator to indicate that you’re sure the pointer can’t be null here. In this case “easy to type” isn’t necessarily valuable; say, Rust chose the syntax .unwrap() which is designed to catch the eye, be easy to notice and audit.

Sorry but I don’t see why anything needs to be built-in. This looks self-explanatory to me:

_Optional int *x = ...;
assert(x);
int *y = (int *)x;

I don’t expect it to be needed very often, so I see your proposals as a ‘nice-to-have’. It would also be nice to have an optional_cast for use in C++ code, but that wasn’t the language I was mainly concerned with.

I don’t see it as fundamentally different from the following commonplace code:

int x = ...;
assert(x >= 0);
unsigned int x = (int)x;

So like I said in the other thread, I think this attribute doesn’t need to be checked by the static analyzer. The contract behind your attribute can be much simpler, fully resolved with either purely syntactic analysis or with very basic flow-sensitive analysis.

It was one of my design goals that purely syntactic analysis should be sufficient. I used that for all my early testing. However, if you think you can implement basic flow-sensitive analysis in the compiler without requiring use of the static analyzer, I’m very interested in that. I wouldn’t know where to start.

The static analyzer can take advantage of it. You can introduce a warning about any unchecked dereference of the _Optional pointer.

I already implemented that, and my new checks catch a lot of undefined behaviour that was previously ignored. I really like that.

The easiest way to introduce such warning is to perform a state split every time the pointer is encountered: in one state the pointer is null, in the other state it’s non-null. Then the null case simply becomes a path that the analyzer has to explore.

I wanted to do that, and even had an attempt, but it didn’t seem to be necessary to make my prototype useful and I didn’t want my wife to divorce me during my paternity leave. Again, if you think you can do this, then that would be wonderful.

I considered specifying as part of my paper how static analysis should work but deliberately left the wording vague in the expectation that different implementions would diverge. If I’m honest, I think this is the biggest weakness of my paper, not the endless arguments over syntax. However, I take solace from the fact that different toolchains already generate different warnings (or none) for the same code.

Maybe there’s still room in the analyzer to warn about invalid force-unwraps, but most of such warnings would be about potential execution paths that the developer has just explicitly said aren’t there, aka false positives.

That sounds like a bad idea to me, and it seems you agree. I want to maintain a strong distinction between verifiable unwraps (&*s) and force-unwraps ((int *)s).

Thank you for your thoughtful comments.

Christopher_Bazley · February 2, 2023, 4:23pm

NoQ:

Christopher_Bazley:

Isn’t the quality of being aliased a property of an object, rather than any single pointer to it?

Well, no, it’s a property of a given group of pointers to the object. Just because an object has many pointers point to it, i.e. generally participates in aliasing, doesn’t mean you can’t pass it as an argument to
memcpy(void *restrict dst, const void *restrict src, size_t n);
It only becomes a problem when both pointers passed to memcpy() point to the same object. So, just these specific pointers, regardless of every other pointer in the program.

The syntax I had in mind was something like

memcpy(exclusive void *dst, exclusive const void *src, size_t n);

I explained in my paper why that would give better semantics for compatibility of declarations, although the presumed semantics for assignment could be less safe.

You could also compare the simplicity of my patch to check that _Optional isn’t used at top-level (⚙ D142738 Warn if _Optional used at top-level of decl) with existing code to check that restrict isn’t used at bottom-level, which has to check whether the type is a pointer (of any variety) in BuildQualifiedType for every level of GetFullTypeForDeclarator (which walks the DeclTypeInfo backwards) as well as in GetDeclSpecTypeForDeclarator.

I’ve also seen it argued that restrict should be the default, and that aliasing pointers should instead be explicitly qualified. Just like pointers that can be null are in a minority, so are pointers that can alias, so I can see value in that alternative universe.

In that case, this would apply:

memmove(aliased void *dst, aliased const void *src, size_t n);

This would have resolved the conflict between the qualifier-enables-optimisation behaviour of restrict with the qualifier-disables-optimisation behaviour of volatile. It’s all ancient history though.

Christopher_Bazley · February 2, 2023, 6:15pm

It differs from _Optional, whether documented or not. Do you think that _Nullable semantically differs from const? I assume so. The code to implement the nullability qualifiers in Clang is vastly more complex than the minor additions I made for _Optional (including the static analyzer), yet the user experience is worse in every way:

Dereferences of _Nullable pointers generate no warning.
Conversions from _Nullable to _Nonnull generate no warning unless the user specifies -Wnullable-to-nonnull-conversion (not sure how they are meant to know about that).
Conversions from _Nullable to unqualified generate no warning at all, ever.
Calls to _Nullable function pointer generate no warning.
Clang does not warn about mismatches between function declarations which do/don’t have _Nullable qualified arguments.
See Compiler Explorer

There’s another thread about those bugs: Nullability analyzer doesn't seem to work (and how to fix it)

In contrast, _Optional has exactly the same behaviour as const in every context except static analysis, where

dereferences of pointer-to-_Optional do generate a warning, even if it’s only a syntactic dereference. This catches a lot of undefined behaviour.
calls to _Optional functions via pointers do generate a warning.

Christopher_Bazley · February 2, 2023, 6:29pm

I’m not sure what your point is. The opinion exists, whether you agree with it or not.

I have (anecdotal) evidence that their operation, to the extent that they work at all, was so obscure to me and my colleagues that I didn’t understand it until I began browsing the source code of Clang.

That’s probably because _Nonnull is the usual usage of pointers in the C language (which is the whole point of my proposal to qualify only the opposite case) and _Nullable appears to be mostly broken/useless.

I can’t really help if you don’t see a problem with the public interface of a module diverging arbitrarily from its actual implementation in ways that the compiler cannot verify.

The difference is that the type checking and static analysis using _Optional would actually work, the implementation in the compiler would be orders of magnitude (10? 20 times?) simpler, and it would provide a high degree of null pointer safety even in a compiler that performs no path-sensitive analysis.

I find it easy to explain: the address of an object is never null. Every C programmer already knows that.

Not sure what you meant by that.

I’d hate that, but sure, go ahead

Christopher_Bazley · February 2, 2023, 9:56pm

I didn’t propose any change to the standard library headers because function signatures have to be backward-compatible. Existing code which uses the address of free() and expects it to have the signature void free(void *) would fail to compile if the signature were instead void free(_Optional void *). The most obvious example is when free is used as a callback function.

As with every other aspect of my proposal, this could be worked around by defining _Optional as an empty macro when invoking the compiler (i.e. in the Makefile or equivalent), but a better idea would be to improve the rules for compatibility of function signatures. I don’t feel like submitting another paper to do that right now.

I’d like to see evidence that it makes any measurable difference. The efficiency of executing modern software is almost entirely bounded by memory access. Untaken branches could increase instruction cache usage, but I still doubt the difference would be measurable unless every other function call were free() and that function was also inlined.

Christopher_Bazley · February 2, 2023, 10:47pm

Obviously, I find that disappointing. It’s not clear to me whether there is a decision-making process, or you are a BDFL.

I guess that ties into point 4 in the list ‘Contributing Extensions to Clang’, which is…

the extension itself must have an active proposal and proponent within that committee and have a reasonable chance of acceptance. Clang should drive the standard, not diverge from it.

If I’d read that more carefully, then I might never have bothered prototyping my extension in Clang in the first place. Is there a list of committee members I could petition for a proponent?

But also, it’s not clear to me why my extension falls into that category rather than…

This criterion does not apply to all extensions, since some extensions fall outside of the realm of the standards bodies.

I don’t think “data” is needed to show that a feature which does not require static analysis will catch bugs that cannot be caught by features which do require static analysis – but only if you accept the premise that compilers that do not perform such analysis have any value.

The prototype I created works “out of the box” precisely because const is a proven solution. I can’t predict how fixable other features might be. I’m not interested in fixing something I don’t want to use.

This is a highly personal judgement. Some users might be happy to jump through hoops, as a trade-off for the proposed feature. Some might judge using C instead of C++ to be ‘jumping through hoops’. Ultimately, it’s a personal choice. My proposal currently has an 85% upvote rate on Reddit. Do those people’s opinions not count? Do you think they didn’t notice the section headed ‘Function pointers’?

I’d be happy to provide those if I thought my proposal had any chance of being accepted.

Christopher_Bazley · February 3, 2023, 12:10pm

* negligable, after a quarter century

* parameters

* unusable

The syntax is unusable for most cases, and the semantics are not even close to what I desire, partly because “limited to only function parameters” is not useful.

The very first sentence of K&R’s book “The C programming language” is

C is a general-purpose programming language which features economy of expression…

Neither static array extents nor any of the alternative methods of annotating function parameters resemble “economy of expression”.

Anyone advocating that C programmers write classes like this:

bool coord_stack_init(coord_stack stack[static 1], size_t limit);
void coord_stack_term(coord_stack stack[static 1]);
bool coord_stack_push(coord_stack stack[static 1], coord item);
coord coord_stack_pop(coord_stack stack[static 1]);
bool coord_stack_is_empty(coord_stack stack[static 1]);

Or like this:

bool coord_stack_init(coord_stack *_Nonnull stack, size_t limit);
void coord_stack_term(coord_stack *_Nonnull stack);
bool coord_stack_push(coord_stack *_Nonnull stack, coord item);
coord coord_stack_pop(coord_stack *_Nonnull stack);
bool coord_stack_is_empty(coord_stack *_Nonnull stack);

Or like this:

bool coord_stack_init(__attribute__((nonnull)) coord_stack *stack, size_t limit);
void coord_stack_term(__attribute__((nonnull)) coord_stack *stack);
bool coord_stack_push(__attribute__((nonnull)) coord_stack *stack, coord item);
coord coord_stack_pop(__attribute__((nonnull)) coord_stack *stack);
bool coord_stack_is_empty(__attribute__((nonnull)) coord_stack *stack);

Or like this:

bool coord_stack_init(coord_stack *stack, size_t limit) __attribute__((nonnull (1, 1)));
void coord_stack_term(coord_stack *stack) __attribute__((nonnull (1, 1)));
bool coord_stack_push(coord_stack *stack, coord item) __attribute__((nonnull (1, 1)));
coord coord_stack_pop(coord_stack *stack) __attribute__((nonnull (1, 1)));
bool coord_stack_is_empty(coord_stack *stack) __attribute__((nonnull (1, 1)));

Instead of like this:

bool coord_stack_init(coord_stack *stack, size_t limit);
void coord_stack_term(coord_stack *stack);
bool coord_stack_push(coord_stack *stack, coord item);
coord coord_stack_pop(coord_stack *stack);
bool coord_stack_is_empty(coord_stack *stack);

seemingly has no interest in keeping C “pleasant, expressive, and versatile” (as K&R designed it to be), and might even have a hidden agenda to push C users towards C++ (“just use references”).

To be honest, the only opinions I would fully trust on this question are those of people who code in C every day for enjoyment. Putting C++ programmers in charge of the future of C is like putting foxes in charge of a henhouse. Stroustrup’s book “The Design and Evolution of C++” (1994) makes that abundantly clear.

Christopher_Bazley · February 3, 2023, 12:45pm

Attributes are not a mandatory part of the type system, therefore they don’t meet my specified design criteria.

The number and variety of brackets and underscores contained in any extra word (or words) are a personal style concern; the fact that more words are needed is not. That’s just a fact.

I created a working prototype to show that regardless of what people might assume, the idea is not a non-starter.

You could describe a const qualifier on a pointee as an annotation which means “this pointer might be to a read-only object”. Yet all pointers can be treated as read-only. It turns out that const has been incrementally adopted just fine.

I’m not sure what your point is. Yes, there is a general rule. That doesn’t preclude it having specific and teachable effects on pointer types.

Excellent! That’s exactly what I wanted people to conclude.

AaronBallman:

e.g., I would not want to close the door on being able to write:

  _Optional int get_value(struct something *ptr) {
    if (ptr)
      return ptr->field;
    return _None;
  }

  int main() {
    _Optional int val = get_value(nullptr);
    if (val)
      return *val; // Steals C++ syntax, but the * could be replaced by another syntactic marker
    return EXIT_FAILURE;
  }

This is pure magic, and C does not do magic. Everything in a C program is exactly what it appears to be. What I have done instead is marry the intended behaviour in your example above with C’s syntax using some minor but very carefully chosen adjustments.

I haven’t closed the door on being able to write the code that you wrote, because I implemented your request to ban usage of _Optional at top level. I’m not convinced that’s a good idea though, partly because it might lead people in future to say “Can we reuse this keyword to mean something different?” instead of creating an alternative.

AaronBallman · February 3, 2023, 1:45pm

Our decision-making process is somewhat ad hoc in terms of coming to a conclusion. There are not BDFLs, but we do have code owners who help with decision-making (https://github.com/llvm/llvm-project/blob/main/clang/CodeOwners.rst). I’m the code owner for C conformance and the general code owner for Clang, but my opinions upthread were personal opinions about the proposal. The basic process is what you’ve been seeing – someone proposes something, there’s discussion on the proposal, and the proposal gets consensus (or not) based on the content of the thread.

By my reading of the thread, there are multiple code owners who are not convinced of this design, including me. I do not see the proposal having consensus to add to Clang in this form, but I do see plenty of interest in improving diagnostic functionality in this area (whether it’s a new qualifier, new attribute, improved analyses, etc). I believe this proposal in this form has been rejected at this point, but if we get new information on the topic, we would certainly revisit it.

I’ve been a regular member of WG14 for about six years now and am happy to help you with questions you have about process for the committee. The committee has some documentation on our process at Contributing but because it’s an ISO committee, there is a lot of bureaucracy to navigate unfortunately. There are a few other Clang and LLVM community members who come to WG14 meetings but with less regularity, but they could also help.

One thing the committee does for folks in your situation is allow you to attend a meeting as a guest so that you can present your own work and hear feedback directly. But ISO has rules about non-member participation and so you can only be a guest once or twice before they start asking you to join your country’s national body so you can get into the ISO global directory as a member (and this can cost money depending on what national body you’re joining). I saw you posted WG14 N3089 to the committee already and if you’d like, I can put you in touch with the convener so he’s aware you’ll need an invite to the meeting at which we discuss it. If that’s something you’d like me to, please send me your email address (privately if you prefer) so I can CC you on the conversation.

The alternative is that you can find a champion within the committee who will try to advocate for your work (or become a co-author on the paper, etc). This tends to be a harder road though as nobody advocates for a paper as well as its author. I’d have a bit of hard time being a champion for your paper given my position on it, but I’ve been in that situation before and can do my best to present your work neutrally and get you feedback. However, I can also help get you in touch with other folks on the committee who might be willing to champion it instead.

It’s a grey area, to be sure. What I think of for extensions outside of the realm of standards bodies are things like HLSL support where there is no official standard for it or attributes that are inappropriate for standardization (target-specific ones, etc). In this case, you’re proposing a new type qualifier for something that’s platform independent which is the sort of thing we want the standards committee to weigh in on.

To be clear, I’m giving you feedback on what would make your proposal more acceptable to me. So yes, it’s a personal judgement. You’re free to ignore my experience as a compiler engineer and member of standards committees, but I don’t recall a time when we’ve adopted something this irregular before.

That’s certainly fair. This exact proposal is not accepted, but if WG14 came back showing strong support for it, that would be new information for our community and would certainly be worth revisiting the discussion over. So it’s hard to say “no chance of acceptance”, but it is fair to say “unlikely to be accepted without modification based on feedback from the discussion”. Note, that “modification” can be a section on “here’s the community feedback and here’s my rebuttal” as well as material changes to the proposal.

The C committee is strongly considering standardizing lambdas. We added constexpr support for objects (not functions yet) and automatic type inference in C2x. We’ve shown significant interest in defer (enough that we may spin out a TS for it). What is considered “magic” is subjective and the committee has shown quite a bit of support for adding more modern facilities to the language in this release.

user1984 · February 7, 2023, 11:50am

Thank you. I had a good read of that just now.
I tried g++ -std=c++17 with _Nullable in godbot.org but it didn’t work. Is that in G++?

user1984 · February 7, 2023, 11:54am

Those clang nullability attributes all start with underscore, _Nullable, _Nonnull I know that’s because it’s a compiler internal, also it avoids colliding with any existing parameter names in code, or macros. However, it does look messy. If it’s going to part of the C or C++ standards it’s better to not have the underscore and use lowercase. “nullptr” doesn’t have an underscore, it was added a decade ago I recall.

AaronBallman · February 7, 2023, 1:13pm

The keywords are named so that they’re in the reserved namespace so they “won’t” conflict with user-defined identifiers (some users like stealing reserved identifiers but we don’t worry when we break those users). The C standard will often implement keywords with the same kind of spelling for the same reason, and then introduce a macro in a header file so you can opt into a different spelling. e.g., _Bool and bool (from <stdbool.h>) in C99 or _Static_assert and static_assert (from <assert.h>) in C11, etc.

user1984 · February 7, 2023, 1:35pm

Fair enough, that makes sense, I saw _Static_assert has been deprecated in C23 now static_assert is widely adopted.

tahonermann · February 7, 2023, 5:56pm

nullptr was added to C++11, but was only just recently added to C for C23 via N3042.

EmilOhlsson · February 16, 2023, 12:17pm

First of all, I think this is a great proposal. I think it’s bold, and I think it’s the kind of bold that C (and C++) would need to step up the memory safety game.

As I read the proposal first I got the impression that the proposal implied that all non _Optional pointers are assumed to never be null, as mentioned as point 2 by @NoQ above. I agree with the assessment that the _Optional annotation would be less useful without the compiler having a stronger enforcement of it.

That’s why it wasn’t part of my proposition, although a surprising number of people have been telling me that (1.) is useless without (2.). I do think that if (1.) is adopted then eventually, someone will implement (2.), but I’d expect it to be opt-in.

This makes sense to me, though having it enforced early on would be very useful. I’ve been trying to think of some way of allowing part of the code to opt out, like when interfacing with external code.

Topic		Replies	Views
RFC: Nullability qualifiers Clang Frontend	8	108	June 15, 2015
RFC: Nullability qualifiers Clang Frontend	2	90	March 17, 2015
"static" array type derivation Clang Frontend	2	129	April 19, 2012
RFC: Nullability qualifiers Clang Frontend	37	1894	August 3, 2022
Nullability of objects in C-API MLIR	3	264	August 17, 2020

[RFC] _Optional: a type qualifier to indicate pointer nullability

Related Topics