RFC: do not optimize on basis of __attribute__((nonnull)) in glibc headers

Via https://reviews.llvm.org/D27855, LLVM is likely to gain the ability to delete null checks in callers based on attribute((nonnull)) on the callee. This interacts badly with glibc’s choice to mark the pointer parameters of memcpy, memmove, etc. as attribute((nonnull)) – it is relatively common for programs to pass a null pointer and a zero size to these functions, and in practice libc implementations accept such usage. Indeed, LLVM’s lowering of @llvm.memcpy intrinsics relies on these calls working.

Deleting a null pointer check on p after a memcpy(p, q, 0) call seems extremely user-hostile, and very unlikely to result in a valuable improvement to the program, so I propose that we stop lowering attribute((nonnull)) on these builtin library functions to the llvm nonnull attribute.

(Chandler is working on a paper for the C++ committee proposing to give these functions defined behavior when given a null pointer and a zero size, but optimizing on the basis of these particular nonnull attributes seems like a bad idea regardless of the C or C++ committees’ decisions.)

Thoughts?

"memcpy, memmove, etc." seems to be hiding some details here; do you have a complete list of functions you want to special-case? There are a lot of functions which could potentially go on that list, and some of them are POSIX or GNU extensions.

Why do we care about keeping these null pointer checks in particular, as opposed to providing a flag which stops the optimizer from deleting null pointer checks in general (like gcc's -fno-delete-null-pointer-checks).

-Eli

“memcpy, memmove, etc.” seems to be hiding some details here; do you
have a complete list of functions you want to special-case? There are a
lot of functions which could potentially go on that list, and some of

them are POSIX or GNU extensions.

The only ones I’m immediately concerned with are the mem* and str* functions which accept an explicit size argument governing the pointers. I would be happy to restrict it further if that helps.

Why do we care about keeping these null pointer checks in particular, as
opposed to providing a flag which stops the optimizer from deleting null
pointer checks in general (like gcc’s -fno-delete-null-pointer-checks).

There are two reasons really.

First is pragmatic: all of the security vulnerabilities I happen to be aware of stem from deleting null pointer checks around the mem* functions, and the functions that LLVM’s own intrinsics specify null pointers as allowed for are the mem* functions. Thus they seem likely to be the most risky. Certainly, there are security issues stemming from other null checks being eliminated, but in a purely pragmatic sense, these seems like a useful first step.

Second is perhaps more interesting. I think there are a large number of null pointer checks that we can and should eliminate. I’ll give one example here of an entire class: references to objects with non-zero size. Taking the address of this cannot produce a null pointer, and code checking for null I think should be optimized away IMO.

There is something very special about the nature of the mem* intrinsics and that is that they can accept a zero size. It is that zero size case that I think makes a null pointer a reasonable argument to them. I would like to suggest that whenever we have a pointer and size where the size could reasonably be zero, the pointer could also reasonably be null.

While this will cover some number of the current uses of nonnull attribute, I think there are a reasonably large number of other uses that I’m not sure such sweeping statements can be made about. With a reference to a non-zero sized object, I don’t see any reason to eliminate the nonnull attribute or to stop optimizing away. Indeed, the way these occur in C++ as part of the type system can make this an incredibly important optimization in conjunction with proper inlining.

I’m not really opposed to having a mode that disables all non-null optimizations, but I think that is a very different thing and I’m personally much less interested in such a mode. I don’t think we could reasonably make that the default for example, whereas I think we could reasonably drop these particular nonnull attributes from glibc headers by default.

-Chandler

On the one hand, GCC has already started optimizing based on the nonnull attribute on memcpy, and so that ship has sailed. We already need to fix all of our code to work in the face of those optimizations, regardless of what Clang/LLVM does, because the vast majority of our code is expected to work correctly when compiled with GCC. Even if we fix this in C/C++ at the standards level now, it seems likely to be too late. Moreover, a conforming implementation of the underlying library function could still do problematic things. On the other hand, I don’t feel like we should be unnecessarily hostile to users. This particular “feature” of how memcpy is (not) specified is not well known and gives fairly-obvious-looking code undefined behavior. No pointer casting or other likely-to-cause-problems constructs required. I suspect that it is useful to preserve the fact that memcpy(p, q, n) implies that p and q are dereferenceable_or_null, although we can certainly do that in the optimizer regardless of what attributes are applied if we must, and strip the attribute in places that tend to be problematic if desired. We should have a flag that disables this stripping if we do it by default. We could also have a mode that inserts a null check, or a zero-size check, and branch around all calls to memcpy and friends. This has the advantage of protecting against odd, but conforming, library implementations that assume the pointers are not null (not that I know of any such implementation). In short, I don’t have a strong feeling on this either way. I’m fine with stripping the attributes. I expect that ubsan will (continue to) complain about the situation regardless. Thanks again, Hal

I know of a large number of libraries that only behave correctly with recent GCCs by using the flag Eli mentioned to disable all nonnull optimizations wholesale. =/

If the standards committees actually change their mind here, I think we could reasonably enable exactly this flag for Clang and LLVM when built with a GCC version not implementing the fix.

And even if not, even if we do actually have to fix all of our code to work in the face of those optimizations, I at least have a large body of users that aren’t sure they will ever finish fixing all of these issues. They don’t have any good way to test and discover all of them, only the ones hit in code paths today. So I’d still like to give them a toolchain that will protect the places in their software where they erroneously relied on a guarantee not provided and have no tests or ability to go and fix.

-Chandler

I agree with this. -fno-delete-null-pointer-checks is too heavy a hammer to ideally address this problem. We can and should provide a more-targeted solution (regardless of whether it is on by default). -Hal

Via ⚙ D27855 try to extend nonnull-ness of arguments from a callsite back to its parent function, LLVM is likely to gain the ability
to delete null checks in callers based on __attribute__((nonnull)) on the
callee. This interacts badly with glibc's choice to mark the pointer
parameters of memcpy, memmove, etc. as __attribute__((nonnull)) -- it is
relatively common for programs to pass a null pointer and a zero size to
these functions, and in practice libc implementations accept such usage.
Indeed, LLVM's lowering of @llvm.memcpy intrinsics relies on these calls
working.

Deleting a null pointer check on p after a memcpy(p, q, 0) call seems
extremely user-hostile, and very unlikely to result in a valuable
improvement to the program, so I propose that we stop lowering
__attribute__((nonnull)) on these builtin library functions to the llvm
nonnull attribute.

(Chandler is working on a paper for the C++ committee proposing to give
these functions defined behavior when given a null pointer and a zero size,
but optimizing on the basis of these particular nonnull attributes seems
like a bad idea regardless of the C or C++ committees' decisions.)

Thoughts?

"memcpy, memmove, etc." seems to be hiding some details here; do you have
a complete list of functions you want to special-case? There are a lot of
functions which could potentially go on that list, and some of them are
POSIX or GNU extensions.

The following glibc functions in <string.h> take both a pointer and a size
providing an upper limit on the number of bytes that may be accessed
through that pointer, and mark the pointer as __attribute__((nonnull)):

ISO C:
  memcpy, memmove, memset, memcmp, memchr, memrchr
  strncpy [param 1 only], strncat [param 1 only], strncmp

Extensions:
  strndup, memccpy, mempcpy, strnlen, bcopy, bzero, bcmp, memfrob,
  strncasecmp, strncasecmp_l [not parameter 4], stpncpy

I would be interested in covering at least the ISO C functions. I don't
have strong opinions about the extensions (any code built using those is
likely also compiled with GCC, and so presumably has to fix this
regardless).

Why do we care about keeping these null pointer checks in particular, as

opposed to providing a flag which stops the optimizer from deleting null
pointer checks in general (like gcc's -fno-delete-null-pointer-checks).

Deleting null checks when we can prove a pointer logically cannot possibly
be null is still useful -- for instance, deleting a null check after the
pointer was already passed to strlen, or when it's the address of a local
variable, both seem fine.

I would argue strongly to cover bcopy, bzero, and bcmp at the very least. These are BSD in origin and if anything I suspect there is more code at risk of failure using those routines.

Looking at the list, I don’t see any reason not to cover all of them.

I know of some too. I suspect that essentially everyone with a non-trivially-sized codebase is in this boat. This definitely seems useful. -Hal

IMO, ignoring nonnull on all of these would be a fine idea -- but if we go
down this path, please do open a bug against glibc asking for them to be
removed, pointing out that,
a. it is harmful to mark them nonnull because <evidence>.
b. clang is going to start ignoring the attributes on those functions even
if glibc doesn't remove them, because of a.
c. there will be a standards proposal to stop requiring nonnull, too.

It's worth at least an attempt to persuade them it's wrong to have those
attributes.

BTW, I just ran this test program (built with “gcc -ggdb -fno-builtin -Wall test.c”), to check if the library functions actually do all succeed when given null values, or if some might segfault. It would appear that in the current library implementation, on x86-64, everything can be null when given a zero length, other than the 1st arg to strncat.

#define _GNU_SOURCE
#include <string.h>
#include <locale.h>

char *a = 0, *b = 0;
char valid[10];

int main(int argc, char ** argv) {
locale_t ll = newlocale(LC_CTYPE, “”, NULL);
int result = 0;
size_t count = argc-1;
memcpy(a, b, count);
memmove(a, b, count);
memset(a, 4, count);
result += memcmp(a, b, count);
result += memchr(a, 4, count) != 0;
result += memrchr(a, 4, count) != 0;
strncpy(a, b, count);
strncat(valid, b, count); // crashes if 1st arg is not valid
result += strncmp(a, b, count);
strndup(a, count);
memccpy(a, b, 4, count);
mempcpy(a, b, count);
result += strnlen(a, count);
bcopy(a, b, count);
bzero(a, count);
result += bcmp(a, b, count);
memfrob(a, count);
result += strncasecmp(a, b, count);
result += strncasecmp_l(a, b, count, ll);
stpncpy(a, b, count);
return result;
}

BTW, I just ran this test program (built with "gcc -ggdb -fno-builtin
-Wall test.c"), to check if the library functions actually do all succeed
when given null values, or if some might segfault. It would appear that in
the current library implementation, on x86-64, everything can be null when
given a zero length, other than the 1st arg to strncat.

My bad -- the nonnull on parameter 1 of strncat is correct, it's the
nonnull on parameter 2 that is dubious.

On the one hand, I think it's reasonable because we don't want the
optimizer to delete those null checks. On the other hand, the
functions are properly attributed as far as the standard is concerned.
C11 7.24.1p2 is very clear that even with a 0 size, the pointer *must*
still be valid. So I would be opposed to ignoring those attributes in
Sema (I think we should still warn when users do nonportable things),
but in favor of not changing the optimizer to capitalize on this
"opportunity." We've seen real exploits in the wild from this sort of
thing in the past. See
https://www.securecoding.cert.org/confluence/display/c/EXP34-C.+Do+not+dereference+null+pointers
for an example where deleting a null point check based on the
presumption of a nonnull pointer caused the optimizer to introduce a
security exploit in the Linux kernel. While the example does not
involve memcpy() or friends, it does demonstrate that real exploits
happen from this kind of optimization.

Failing to lower these attributes to LLVM IR may impact security
researchers who are relying on that information from the source, but I
think they could either carry a local patch or request this behavior
to have a flag.

~Aaron

So I would be opposed to ignoring those attributes in

Sema (I think we should still warn when users do nonportable things),

but in favor of not changing the optimizer to capitalize on this
"opportunity."

I'd be opposed to ignoring the attributes only in some places and not in
others. It should be ignored totally, as if it wasn't present on those
functions. Doing anything else sends the wrong message -- that libc authors
should continue to use nonnull on these functions because they might be
helpful, and won't do anything bad.

But that should not be the message. The message to libc authors should be
straightforward: please remove nonnull from these functions, because it's
wrong.

E.g.
"Yes, the standard currently says you can't call e.g. memcpy(NULL, NULL,
0), but -- real user programs DO, and always have depended on being able to
do so. And your library implementation is even careful to support that in
its definitions of the functions. So, you should not tell the compiler that
NULL is forbidden, because it would use that information to *mis*optimize
people's code that is using the effectively-universal extension to the
standard of allowing NULL with a zero length. In order to avoid breaking
code before fixed headers are deployed everywhere, Clang has added a hack
to ignore the nonnull attribute on these functions, but we'd like to be
able to remove that hack in the future."

I think that we have a responsibility to our users to continue to warn (statically, in ubsan, etc.) on non-portable behavior, which this is and will continue to be in practice for at least a decade or two, regardless of the message we’d like to send libc authors. We cannot undo history here and this will be relevant to production systems for at least a decade. We can talk to libc developers directly – they’re a much smaller set – and we can pursue change at the standards level while still providing the most useful set of tools to our users in the mean time. -Hal

So I would be opposed to ignoring those attributes in

Sema (I think we should still warn when users do nonportable things),
but in favor of not changing the optimizer to capitalize on this
"opportunity."

I'd be opposed to ignoring the attributes only in some places and not in
others. It should be ignored totally, as if it wasn't present on those
functions. Doing anything else sends the wrong message -- that libc authors
should continue to use nonnull on these functions because they might be
helpful, and won't do anything bad.

But that should not be the message. The message to libc authors should be
straightforward: please remove nonnull from these functions, because it's
wrong.

I empathize with your message, but I disagree. Libc authors are not
wrong; they're conforming to the standard. Even if we're able to
convince WG14 that the standard is wrong, that won't impact practice
for years to come. Failing to warn users about non-portable behavior
is also a hostile way to treat users; I feel strongly that we should
continue to warn users when possible even if we ignore the attributes
due to dangerous optimizations.

~Aaron

But, this is an entirely different question.

- Should clang warn about non-portable usage of passing NULL to memcpy/etc?

Sounds like a fine warning to add.

- Should that warning be dependent on the libc headers having nonnull
annotations on these functions, which will be used only for warnings, and
ignored for semantics, on this given list of hardcoded functions?

No.

Firstly: I'd note that nearly all libc implementations don't use these
attributes today. In some cases, because they've simply not thought about
it, but in others because they explicitly decided to NOT break their users'
code by introducing this problem! Glibc is the outlier, here.

So: what portability do you want to warn for? Portability assuming the same
libc, but a different compiler which might fail to ignore the nonnull
attribute? Or portability to other libc? If the latter, depending on the
nonnull attribute being present doesn't and can't work.

Secondly: if we already have a hardcoded list of functions to special case,
that could just as well be used to generate a nonportable-stringfunc-null
warning, as well.

Agreed. We should use the list to generate warnings, etc. regardless of how the headers are annotated. -Hal

So I would be opposed to ignoring those attributes in

Sema (I think we should still warn when users do nonportable things),
but in favor of not changing the optimizer to capitalize on this
"opportunity."

I'd be opposed to ignoring the attributes only in some places and not in
others. It should be ignored totally, as if it wasn't present on those
functions. Doing anything else sends the wrong message -- that libc authors
should continue to use nonnull on these functions because they might be
helpful, and won't do anything bad.

I think that we have a responsibility to our users to continue to warn
(statically, in ubsan, etc.) on non-portable behavior, which this is and
will continue to be in practice for at least a decade or two, regardless of
the message we'd like to send libc authors. We cannot undo history here and
this will be relevant to production systems for at least a decade. We can
talk to libc developers directly -- they're a much smaller set -- and we can
pursue change at the standards level while still providing the most useful
set of tools to our users in the mean time.

But, this is an entirely different question.

- Should clang warn about non-portable usage of passing NULL to memcpy/etc?

Sounds like a fine warning to add.

- Should that warning be dependent on the libc headers having nonnull
annotations on these functions, which will be used only for warnings, and
ignored for semantics, on this given list of hardcoded functions?

No.

Firstly: I'd note that nearly all libc implementations don't use these
attributes today. In some cases, because they've simply not thought about
it, but in others because they explicitly decided to NOT break their users'
code by introducing this problem! Glibc is the outlier, here.

So: what portability do you want to warn for? Portability assuming the same
libc, but a different compiler which might fail to ignore the nonnull
attribute? Or portability to other libc? If the latter, depending on the
nonnull attribute being present doesn't and can't work.

My preference is for portability for both, but you bring up a good
point about other libc implementations not being annotated. So long as
we retain the ability to tell users their code is not portable *and*
we get rid of the dangerous optimizations, I'm happy. I just don't
want to lose the diagnostics because of the optimizations.

~Aaron

I’ve sent a patch to implement this here: https://reviews.llvm.org/D30806

It turns out to be both easier and I hope less controversial than I had imagined. Neither Clang nor LLVM ever actually got nonnull onto these declarations even when they were suitable annotated. We didn’t carry that annotation across to the declaration. Oops.

So I’ve fixed that, but also disabled it for the libc functions that we have library builtin recognition for. We can extend this to cover any set of library functions folks want, just let me know. The use of it will even be correctly controlled by -fno-builtin and friends.

The net result is that this should not regress optimizations for anyone, and actually enable a bunch of optimizations outside of libc functions, while preserving safety on those libc functions.

We still have the nullability attributes which can be used to annotate more interfaces in a way that will provide warnings and static analysis aid, but not optimize on and so not run the risk of changing behavior of existing code. Those would be good candidates (IMO) for libc++ to use on any relevant interfaces which match the criteria outlined in this thread: pointers paired with a size should allow null+zero.

Hopefully this makes everyone happy, please feel free to chime in on the review thread if more discussion is needed.