Current status
Currently, marking an address space as non-integral in DataLayout has multiple implications that I believe should be separate as they are too restrictive for certain targets. Currently, marking a pointer address space as non-integral also marks it as having an unstable bitwise representation (for example in cases where pointers can be changed by a copying GC).
These semantics are too strict for some targets (e.g. AMDGPU/CHERI) where ptrtoint
has a stable value, but the pointer cannot be recreated just from the address since there is additional metadata, but the numeric address is always well-defined.
Proposal
I propose splitting the numeric instability from the “not just an integer” property in the DataLayout, which would be beneficial for CHERI, and I believe also for AMDGPU buffer pointers. Instead of marking an address space as non-integral using a separate :ni-<N>
DataLayout component, my suggestion is that we define these properties as part of the pointer specification: a u
following the p marks the address space as unstable and a n
marks it as non-integral.
So the new pointer spec looks as follows: p[<flags>][<address space>]:<size>:<abi>[:<pref>][:<idx>]
. The optional <flags>
are used to specify properties of pointers in this address space. The character u
marks pointers as having an unstable representation and n
marks pointers as non-integral (i.e. having additional metadata).
Example
A "pn1:128:128:128:64-pu2:32:32:32:32-pnu3:64:64:64:32"
DataLayout defines three pointer types:
-
addrspace(1)
is non-integral: has 128-bit pointers, with a 64-bit index that are not just addresses (e.g. something like a CHERI capability) -
addrspace(2)
is unstable: 32-bit pointers, but the value can change (e.g. copying GC) -
addrspace(3)
is unstable and non-integral: 64-bit pointers with 32-bit index which can change anytime (e.g. fat pointers that are also used in copying GC)
The change and proposed LangRef wording can be seen in [DataLayout][LangRef] Split non-integral and unstable pointer properties by arichardson · Pull Request #105735 · llvm/llvm-project · GitHub
Open question 1
This proposal defines the non-integral property in a way that such pointers could have out-of-band metadata in addition to/instead of in-band metadata.
There are other schemes that only have one of the properties - some fat pointer schemes are entirely in-band (e.g. LowFat pointers, and I presume AMDGPU buffer pointers? Not familiar with those though.). The metadata could be entirely out-of-band such as in (the discontinued) MPX. And finally CHERI has both in-band and out-of-band metadata: wider pointers with bounds+permissions+type information, and an out-of-band validity bit that the hardware invalidates when performing non-monotonic (invalid) operations.
Defining non-integral pointers as potentially having out-of-band metadata implies that transforming a load/store of a non-integral pointer to a series of byte-copies may not be sound, since byte copies might not propagate the metadata.
For example with CHERI the validity bit is only loaded/store from/to memory when the compiler generates the appropriate load+store instructions, but a byte/word/float load/store instruction does not propagate this between the register file and memory.
If this is a problem for AMDGPU, we could add an additional flag to the pointer spec to indicate inline vs out-of-band metadata.