RFC: Adding attribute(nonnull) to things in libc++

This weekend, I got an email from Nuno Lopes informing me that UBSAN now paid attention to attribute(nonnull), and he was having some problems with it going off when using libc++.

I did some investigation, and found that he was exactly right - there were places (deep inside the vector code, for example) which called std::memcpy(null, null, 0) - which is definitely UB.

In an ideal world, our C library would define ::memcpy with the correct annotations, and libc++ would import that into namespace std, and we’d be golden.

But we don’t have a C library - we use whatever is provided by the system we’re running on, so that’s not really an option.

For my testing, I changed libc++'s header:

-using ::memcpy;
+inline _LIBCPP_INLINE_VISIBILITY
+void* memcpy(void* __s1, const void* __s2, size_t __n) attribute((nonnull(1, 2)))
+{ return ::memcpy(__s1, __s2, __n); }

(similarly for memmove and memcmp), and I found several cases of simple code that now UBSAN fires off on:

such as: std::vector v; v.push_back(1);
and : int *p = NULL; std::copy(p,p,p);

This seems fairly useful to me.

I would like to hear other people’s opinions about:

  • Is adding this kind of UB detection something that people want in libc++?

  • What do people think about wrapping the C library functions to enable UBSAN to catch them (this is a separate Q from the first Q, because I can see putting these kind of parameter checks into functions that have no counterpart in the C library). Sadly, this would NOT affect calls to ::memcpy (for example), just std::memcpy.

  • Is that the best way to annotate the declarations? Is there a more portable, standard way to do this (things that start with double underscores worry me). In any case, I would probably wrap this in a macro to support compilers that don’t understand whatever mechanism we end up using.

Thanks

– Marshall

Why should memset / memcpy be attribute nonnull? Is there standardese that supports that?

The generic entry text of the standard section. IMO this is a standard
bug and someone should *please* get it fixed. It is ridiculous that zero
sized operations are considered UB.

Joerg

That would require a change to the C standard, and, as far as I know, there
are no current plans to issue a revised C standard.

I *suppose* that we could change it in the C++ standard, but I doubt that
there would be any support on the committee for making std::memcpy
different from ::memcpy.

-- Marshall

P.S. recent gcc (at least 4.8.x and later) make optimizations based on
this UB (i.e, if you pass a pointer to memcpy, then it can't be NULL).

Well, a good start would be to get a position on whether this
interpretation is intentional. Especially given GCC's aggressive
exploitation. It doesn't make sense to me as there doesn't seem to be a
way that "if len is 0, the pointer must be not-null" can improve an
implementation on any system where memory operations can trap.

Joerg

BTW, this seems to be more an issue with glibc adding the tagging and
not behavior of GCC itself.

Joerg

Not necessarily. Other standards, such as POSIX, are free to define behaviour that is undefined or implementation defined in C. POSIX mandates that a char is exactly 8 bits, for example, which is IB in C. The goal of UB is to give freedom to implementors. Saying that NULL arguments to memcpy are UB does not mean that we are compelled to disallow them, it just means that:

- We don’t have to accept them.
- We don’t have to be consistent in whether we accept or reject them.
- We can choose to do whatever makes implementation easiest.

If the easiest thing is to permit them as long as the length is zero (it seems to be), then that’s a perfectly valid implementation of undefined behaviour.

It is also undefined behaviour whether pointer comparisons between different objects are stable, but for the most part they are (and a lot of code would break if they weren’t), because implementers have decided that this is the easiest implementation of this particular bit of UB.

David

GCC also adds nonnull attributes via its builtin functions mechanism. If we want to follow GCC here we should do the same (Builtins.def has a FIXME about nonnull), which is imo cleaner than wrapping memcpy in libc++ and has the advantage of also working with plain C code.

I don't think this has an impact on optimization right now; Clang lowers memcpy calls to the llvm.memcpy intrinsic and we traditionally have allowed llvm.memcpy with a nullptr and zero length (i.e. the optimizer has to prove that the length argument is non-zero to assume the pointer is dereferenceable).

- Ben

I don't see it on NetBSD with GCC 4.8.4, so a plain prototype doesn't
seeem to trigger it.

Joerg

Hmm, odd. GCC has done so for a long time, maybe it's disabled on some platforms? I get nonnull warnings with gcc 4.7 and gcc 5.0.

$ gcc -Wall -x c - <<< "int main() { memcpy(0, 0, 0); }"
<stdin>: In function ‘main’:
<stdin>:1:1: warning: implicit declaration of function ‘memcpy’ [-Wimplicit-function-declaration]
<stdin>:1:14: warning: incompatible implicit declaration of built-in function ‘memcpy’ [enabled by default]
<stdin>:1:1: warning: null argument where non-null required (argument 1) [-Wnonnull]
<stdin>:1:1: warning: null argument where non-null required (argument 2) [-Wnonnull]

- Ben

Heh. There are no "invalid implementations" of undefined behavior.

The reason I proposed this change was to allow UBSAN to detect undefined
behavior, and report it to users - rather than having their code
(potentially) fail in hard-to-diagnose ways when they change *something*
(refactor code, change compilers, change optimization level, etc).

-- Marshall

This weekend, I got an email from Nuno Lopes informing me that UBSAN now
paid attention to attribute(nonnull), and he was having some problems with
it going off when using libc++.

FYI, I also looked into turning this on, but with libstdc++, and found that
they annotated basic_string<T>::assign(pointer, len) with attribute
nonnull. That's a problem, because it's valid to call
basic_string<T>::assign(nullptr, 0), but the reasoning why it's valid makes
me want to ask the committee whether this is what they intended.

The language std text claims that the pointer must point to an array of 'n'
(second argument) length, but earlier in the text it also states that in
the library, whenever it says "array" it means any pointer upon which
address computations and accesses to objects (that would be valid if the
pointer did point to the first element of such an array). Thus, nullptr is
valid if 'n' is zero.

This was changed in DR2235:
http://cplusplus.github.io/LWG/lwg-defects.html#2235
The text and discussion of DR2235 sound like they intend to make the
behaviour of assign match that of the constructor that takes the same
arguments. What they actually did was change the constructor to match the
behaviour of assign, and it doesn't look like removing the requirement of a
nonnull pointer was considered and intended.

At this point I made a note that somebody should ask the committee when
they get the chance, and never got back around to it.

Nick

I did some investigation, and found that he was exactly right - there were

I think that the reference in the discussion to [res.on.arguments] (the
relevant section there is paragraph 1.2) make it clear that the change was
deliberate - to allow (null, 0) as a set of parameters.

I do agree that this is different behavior than memcpy, memmove, etc.
That should make Joerg happy :slight_smile:

-- Marshall

I just want to revive this thread.

I’m not really endorsing this particular use of non-null in API design (I agree with the committee that we should permit ‘nullptr, 0’ arguments). However, I think that we should add these attributes for memcpy and friends. Why? Because glibc has already shipped with these attributes which means that portable code will never be able to pass a null pointer here without running the risk of miscompile. Even if the standard were to retract the dubious wording, we would have a very large number of glibc installations in the world with the attributes and a large number of compilers that will miscompile code by optimizing based on them. We should, IMO, annotate it everywhere so that warnings and tools like UBSan can find all of these bugs waiting to happen.

To get an idea of how widespread this is on at least the Linux platform, they went into glibc over a decade ago:
https://sourceware.org/git/?p=glibc.git;a=commit;f=string/string.h;h=be27d08c05911a658949ba7b84f4321a65a2dbf4

I just spent a pile of time cleaning up LLVM and Clang. It would be great to get more help keeping us in this clean state.

I just want to revive this thread.

I'm not really endorsing this particular use of non-null in API design (I
agree with the committee that we should permit 'nullptr, 0' arguments).
However, I think that we should add these attributes for memcpy and
friends. Why? Because glibc has already shipped with these attributes which
means that portable code will never be able to pass a null pointer here
without running the risk of miscompile. Even if the standard were to
retract the dubious wording, we would have a very large number of glibc
installations in the world with the attributes and a large number of
compilers that will miscompile code by optimizing based on them. We should,
IMO, annotate it everywhere so that warnings and tools like UBSan can find
all of these bugs waiting to happen.

I rather like optimizing on these things - but if you don't & you just want
to provide defense for those users writing portable code without punishing
them on our own compiler, what about using Doug's new nullability
attributes that are documented to not impact optimizatiosn/codegen?

I, too like optimizing on these things.
Smaller code that runs faster is always nice.

-- Marshall

Re-send because the original went to the old lists.
– Marshall