Is address space 1 reserved?

This sounds similar to a problem we face in the HSAIL backend. NULL is not a constant in HSAIL, but an instruction that returns “a constant address that is guaranteed to be invalid for the given address space”. The instruction will always return the same constant, so it can be stored and used in a comparison. So in HSAIL, zero is a valid address that is not required to coincide with NULL. It would be incorrect to say that “null is dereferenceable” or that an object “resides at null” just because its address is zero. The LLVM IR has the symbolic constant “null”, which is great, but the trouble is in the SelectionDAG where it gets replaced by a zero. If the SDNode were to retain a symbolic “null”, that would be sufficient for the HSAIL backend to emit the appropriate instruction. Sameer.

For CHERI, we've had a similar issue, but we do treat null and inttoptr 0 as generating something that C regards as a null pointer. In our model, pointers are not integers, so null and inttoptr 0 in the address space that we use for memory capabilities provide values that are not integers, but which are guaranteed not to compare equal to any valid capability.

I'm happy to share more details about this. They'll be appearing in an ASPLOS paper in a couple of months and are in the latest revision of our ISA reference. Getting the C semantics right required some refinements to our ISA.

C requires that (void*)0 generates a pointer that does not compare equal to any valid pointer. It does not require that (void*)foo, where foo is an int of value 0 but not an integer constant expression, give the same value, but this is one issue where the abstract machine and real code do not always agree. I would strongly advice (based on spending much of the last year or so researching the exact requirements for a C abstract machine that can run real code) any architecture that might want to be a C target reserving the pointer value that is numerically 0 as invalid, even if it means starting globals at address 1 (of 4 or 8 or whatever alignment forces them to be).

David

C requires that (void*)0 generates a pointer that does not compare equal to any valid pointer. It does not require that (void*)foo, where foo is an int of value 0 but not an integer constant expression, give the same value,

Does this mean constant propagation can change program semantics?

-- Sanjoy

Yes, that's one of the issues, if you do not enforce this guarantee for all pointers that are derived from integers that have a numerical value of 0. A strict reading of the C standard means that:

void *null = 1-1; // Null pointer, 1-1 is an ICE
int zero = 0;
void *c = zero; // Not guaranteed to be null, zero is not an ICE. Will be null (almost?) everywhere, so programmers expect this to work.
_Bool d = zero == (int)c; // Not guaranteed to be true, but will be (almost?) everywhere so programmers expect it to work.
_Bool e = 0 == (int)null; // Guaranteed to be true

Trivial constant propagation means that c will be a null pointer, but without it then it may be a pointer to some valid object (although whether you're actually allowed to construct a pointer like this is implementation defined).

Some of my colleagues are working on a parameterisable formal specification for C, covering what the standard says, what compilers implement, and what programmers expect. There's a distressingly large amount that isn't in the intersection of these three.

David

You have hit upon the heart of the matter. There is a huge base of C/C++ code out there that assumes the zero is null irrespective of how it gets declared.

A long time ago Stratus wrote a C compiler that had to be as compatible as possible with PL1 -- We were confounded by the fact that we used 1 for a null pointer in PL1 (so it would fault on the 68K with most accesses). We eventually came to the conclusion that there was no chance of using 1 for a null pointer value and still be able to port typical C code. And we try very, very hard to keep all the compilers on our system (Our C, PL1, Cobol, Fortran, Pascal and GCC, now) compatible with each other. Among other tings, we intermix C and PL1 in the kernel. There is way too much casting of pointers to ints in typical C code. There are even a lot of standardized APIs and DKIs that liberally cast pointers to ints (the SVR4 DKI, for example). Of course, it's fine to use a different null value for specialized environments -- just expect porting issues if you are are bringing in arbitrary C code.

Another thing to bear in mind:

There is also standardized code that assumes there are at least 4 distinct pointer values that can't point to a valid memory address: Look up SIG_DFL, SIG_ERR, SIG_HOLD and SIG_IGN in the POSIX standard. We actually leave the entire page zero unmapped to allow for things like this. Overkill, of course, but it's easy to drop an entire page and it's also useful for catching most null pointer mishaps.

I don't think this holds. It is entirely valid for all four of these to point to valid functions. I don't think there's anything in the spec (C or POSIX) that says that these can't be valid objects, only that they do not have to be.

There are some other things where it's useful to have a range of definitely-invalid pointers. A couple of examples come to mind:

- Apple leaves (used to leave?) the bottom 64KB unmapped so that isa pointers for Objective-C objects will never end up there and CoreFoundation can use this address range to signify CF types.

- Lots of JavaScript implementations use the fact that the memory hole in the middle of a 64-bit address space lines up with the bits used to designate NaN values in IEEE floating point values to guarantee valid floating point values and pointers hidden in floating point values can be distinguished.

David