Which assumptions do llvm.memcpy/memmove/memset.* make when the count is 0?

Hi all,

when I call the llvm.memcpy/memmove/memset.* intrinsics, typically I
have to pass in valid (non-dangling, non-NULL pointers) of the given
alignment. However, to what extent to these rules apply when the count
is 0? Concretely (for any variant of the three aforementioned
intrinsics): Is it UB to call them on a dangling pointer when count is
0? On a pointer of less than the given alignment?

The actual operation will of course not do anything, but I am worried
about some analysis seeing a pointer being used as an argument to one of
these intrinsics, and then assuming the pointer is valid and aligned
without proving that the count is > 0.

E.g., Rust's HashMap indirectly calls memset(0x0, 0, 0, ..., false).
Vec calls memcpy(..., 0x1, 0, 4, false). Is that a problem?

Kind regards,
Ralf

PS: I'm not on the list, so please keep me in Cc.

From https://www.quora.com/Is-memcpy-0-0-0-undefined-behaviour-in-C

Yes, the C standard explicitly addresses this in §7.24.1/2, which applies to memcpy and all other functions from string.h

Where an argument declared as size_t n specifies the length of the array for a
function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters

The description of memcpy in §7.24.1.2 does not “explicitly state otherwise”, and §7.1.4 defines invalid pointer as

a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified

So, the pointer arguments of memcpy shall (a violation of a shall clause is UB, per §4/2) have valid values, even though the function will copy zero characters.

So, the pointer arguments of memcpy *shall* (a violation of a shall clause is UB, per §4/2) have valid values, even though the function will copy zero characters.

This is true in C but the question was about LLVM intrinsics.

Since the LangRef does not mention any such restriction, I would assume that memcpy(0,0,0) is not UB in LLVM. If it is UB then we must update the LangRef to be clear on this point (actually we should update the LangRef either way since this is a question that'll come up again).

John

Also note that whereas GCC exploits the tricky definition of memcpy(), LLVM at present doesn't appear to:

   Compiler Explorer

In fact LLVM doesn't even assume the pointer is non-null in a case where I'd argue that it should:

   Compiler Explorer

John

As of earlier this year, we now explicitly ignore the nonnull
attributes that glibc puts on memcpy (and other stdlib functions). I
don't know how LLVM feels about dangling or underaligned pointers in
particular, but AFAICT, we do try hard to make sure that
memcpy(NULL, _, 0) works as the user probably intends.

Here's the thread I read about it:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html . As
I recall, the tl;dr was "optimizing these assumptions to death doesn't
realistically buy us much of anything, and there's a nontrivial amount
of real-world code that depends on this behavior."

Here's the thread I read about it:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html . As
I recall, the tl;dr was "optimizing these assumptions to death doesn't
realistically buy us much of anything, and there's a nontrivial amount
of real-world code that depends on this behavior."

Yeah, I recall that thread. The issue is that the current question comes from Rust whereas the previous discussion was freely mixing C/C++ and middle-end issues. We need to separate these.

I propose documenting in the LangRef that memcpy and related intrinsics are defined even when src and dst don't refer to valid storage as long as the length argument is zero. Then we commit to implementing that behavior. Is that OK with everyone? If so I can update the doc.

John

Hi,

So, the pointer arguments of memcpy shall (a violation of a shall clause is UB, per §4/2) have valid values, even though the function will copy zero characters.

So this puts a bound on what LLVM can do, right? However, (also judging
from the other answers) LLVM sometimes guarantees more than C does.

Here's the thread I read about it:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html . As
I recall, the tl;dr was "optimizing these assumptions to death doesn't
realistically buy us much of anything, and there's a nontrivial amount
of real-world code that depends on this behavior."

Yeah, I recall that thread. The issue is that the current question comes
from Rust whereas the previous discussion was freely mixing C/C++ and
middle-end issues. We need to separate these.

Ah, I wanted to link to that thread but couldn't find it; thanks.
Right, so this is specifically about the llvm intrinsics that Rust uses,
and *not* about the C/C++ frontend.

I propose documenting in the LangRef

Documenting such issues in the LangRef would be great. :slight_smile: That's always
the place I go to with corner cases like this, but often I don't find
the answer there either. (Btw, when I come up with such a corner case
-- is there a bugtracker where "please clarify LangRef"-kind of issues
can be submitted to, or is the mailing list the best venue?)

that memcpy and related intrinsics
are defined even when src and dst don't refer to valid storage as long
as the length argument is zero. Then we commit to implementing that
behavior. Is that OK with everyone? If so I can update the doc.

Please also clarify the behavior for NULL or unaligned pointers. (There
seems to be an entire lattice of "validity levels" for a pointer:
Completely broken, non-NULL and/or aligned, as well as aligned and
pointing to valid storage.)

Judging from "memcpy(NULL, _, 0) is okay", I suppose NULL is okay (both
for src and dest), which only leaves open the question of alignment.

Kind regards,
Ralf

I don't think that was the conclusion of the discussion? I mean the
result was that a NULL pointer should be explicitly valid if the length
argument is zero. That's a bit more restrictive.

Joerg

I don't think that was the conclusion of the discussion? I mean the
result was that a NULL pointer should be explicitly valid if the length
argument is zero. That's a bit more restrictive.

Yeah there's a design space here. I don't care about the result but am volunteering to document whatever people want.

John

Hi,

I propose documenting in the LangRef that memcpy and related intrinsics are
defined even when src and dst don't refer to valid storage as long as the
length argument is zero. Then we commit to implementing that behavior. Is
that OK with everyone? If so I can update the doc.

I don't think that was the conclusion of the discussion? I mean the
result was that a NULL pointer should be explicitly valid if the length
argument is zero. That's a bit more restrictive.

What exactly does valid storage even mean here? When memset is called
to change 4 bytes, all those 4 bytes have to be in valid storage; but
when the count is 0 -- essentially all this needs is an allocation of
length 0, which could be pretty much any pointer? NULL is always
special and hence need separate consideration, but for non-NULL, aren't
they all valid storage of size 0 (but potentially not aligned)?

Kind regards,
Ralf

Hi,