[RFC] Finer-grained non-integral pointer properties

Current status

Currently, marking an address space as non-integral in DataLayout has multiple implications that I believe should be separate as they are too restrictive for certain targets. Currently, marking a pointer address space as non-integral also marks it as having an unstable bitwise representation (for example in cases where pointers can be changed by a copying GC).

These semantics are too strict for some targets (e.g. AMDGPU/CHERI) where ptrtoint has a stable value, but the pointer cannot be recreated just from the address since there is additional metadata, but the numeric address is always well-defined.

Proposal

I propose splitting the numeric instability from the “not just an integer” property in the DataLayout, which would be beneficial for CHERI, and I believe also for AMDGPU buffer pointers. Instead of marking an address space as non-integral using a separate :ni-<N> DataLayout component, my suggestion is that we define these properties as part of the pointer specification: a u following the p marks the address space as unstable and a n marks it as non-integral.

So the new pointer spec looks as follows: p[<flags>][<address space>]:<size>:<abi>[:<pref>][:<idx>]. The optional <flags> are used to specify properties of pointers in this address space. The character u marks pointers as having an unstable representation and n marks pointers as non-integral (i.e. having additional metadata).

Example

A "pn1:128:128:128:64-pu2:32:32:32:32-pnu3:64:64:64:32" DataLayout defines three pointer types:

  • addrspace(1) is non-integral: has 128-bit pointers, with a 64-bit index that are not just addresses (e.g. something like a CHERI capability)

  • addrspace(2) is unstable: 32-bit pointers, but the value can change (e.g. copying GC)

  • addrspace(3) is unstable and non-integral: 64-bit pointers with 32-bit index which can change anytime (e.g. fat pointers that are also used in copying GC)

The change and proposed LangRef wording can be seen in [DataLayout][LangRef] Split non-integral and unstable pointer properties by arichardson · Pull Request #105735 · llvm/llvm-project · GitHub

Open question 1

This proposal defines the non-integral property in a way that such pointers could have out-of-band metadata in addition to/instead of in-band metadata.

There are other schemes that only have one of the properties - some fat pointer schemes are entirely in-band (e.g. LowFat pointers, and I presume AMDGPU buffer pointers? Not familiar with those though.). The metadata could be entirely out-of-band such as in (the discontinued) MPX. And finally CHERI has both in-band and out-of-band metadata: wider pointers with bounds+permissions+type information, and an out-of-band validity bit that the hardware invalidates when performing non-monotonic (invalid) operations.

Defining non-integral pointers as potentially having out-of-band metadata implies that transforming a load/store of a non-integral pointer to a series of byte-copies may not be sound, since byte copies might not propagate the metadata.

For example with CHERI the validity bit is only loaded/store from/to memory when the compiler generates the appropriate load+store instructions, but a byte/word/float load/store instruction does not propagate this between the register file and memory.

If this is a problem for AMDGPU, we could add an additional flag to the pointer spec to indicate inline vs out-of-band metadata.

I can say that all the AMDGPU non-integral pointers (either currently existing or ones that are being thought about) don’t really carry out of bound metadata like you’d be worried about. That is, if they’re a kind of pointer you’re allowed to load/store from, you can rewrite those loads/stores into byte copies soundly.

The reason I give that caveat is that ptr addrspace(8) the buffer resource descriptor, can’t actually be used with standard pointer operations. This is because these resources (which are, for our purposes, 72 bits of various kinds of flags on top of 48 bits of address) have a more complex indexing scheme than LLVM allows and don’t really work with getelementptr.

That is, if I have a ptr %p and want to load N bytes later, I should do

%q = getelementptr i8, ptr %p, i64 N
%r = load T, ptr %q

However, with a buffer resource, the offset is a separate object from the base pointer in the ptr addrspace(8) - for one thing, the offset can vary between lanes, the base address cannot.

So what you do is

%r = call T @llvm.amdgcn.raw.ptr.buffer.load.T(ptr addrspace(8) %p, i32 %q, ...)

This can get more complicated when you get into hardware swizzling and the fact that there can be both index and offset indices - those show up on the struct.ptr.buffer.* operations.

Since this is all a pain, ptr addrspace(7) holds the offset in the low 32 bits and then its load/store/… operations get rewritten into intrinsic calls on ptr addrspace(8)s. Address space 9 is a similar story, but it holds both a 32-bit index and a 32-bit offset - and is mainly used when coming from SPIR-V.

So yeah, in summary, our non-integral pointers have stable bit representations and are mainly ni because either they can’t be GEP’d in the first place or because we can’t have carryout of the low bits.

But re the proposal, I think the out-of-band metadata thing (which we don’t have and it sounds like CHERI does) could be encoded in a third flag - a pointer is structural if the types of the operations on the pointer are significant and can’t be mutated by the compiler.

That is, you can’t rewrite loads from or stores to structural pointers into ones of different types, optimize them to byte copies, etc.

On our side, I think (and @piotr @jayfoad for graphics context) this might be a useful invariant for ptr addrspace(9), which is a <{ptr addrspace(8), i32 index, i32 offset} which has complex addressing semantics (the index and offset part might be swizzled) and, since we’re dealing in texture fetch operations, might needs its loaded types kept unaltered.

I don’t know if that’s too strong an invariant for our use-cases, or if it’d be sufficient to preserve out of band metadata, but it seems worth adding as a third flag.

Thanks for the detailed response! It sounds like you would much prefer if we didn’t impose this restriction and I will rework the patch to add a separate flag for out-of-band metadata that requires being careful about rewriting loads and store of the pointer itself. If this flag is not present, any loads/stores of such pointer types can be rewritten as byte/integer copies of an equivalent size.

For ptr addrspace(7) it sounds like you have an address range of 48 bits with an index of 32 bits, which does not map cleanly to any of the current widths that we define. In your case I would imagine a ptrtoint of ptr addrspace(7) should return i48 (or i64 if we round up to the next power of two). For CHERI we always assume that index width is the same as the usable address space bit, but it sounds like this is not true for you. The first version of âš™ D135158 [DataLayout] Introduce DataLayout::getPointerIntegralSize(AS) adds this as a new property to the DataLayout. This change makes it possible for the ptrtoint result type to no longer be exactly the same as the width of the pointer type.

Your points on complex addressing modes sounds to me like they are orthogonal and could be added as an additional property (“don’t rewrite indexing operations/introduce new GEPs”) later.

Re ptr addrspace(7) , its native ptrtoint return size is i160 - the value is the concatenation of a buffer resource (128 bits, the lower 48 of which is a base address) and then 32 bits of offset.

GEP mutates the offset and must not affect the upper 128 bits

From the perspective of GEP, those upper 128 bits are metadata

i160 makes sense if you capturing all data for potential round-tripping via integers. For CHERI we do not want/need to metadata as part of the ptrtoint result since it cannot be used to recreate original pointer. The only way to obtain the original pointer value again is either by using a GEP or intrinsics that combine the original pointer (or a more permissive superset thereof) and the “address part” of the pointer, to combine the two: ptr addrspace(200) llvm.cheri.cap.address.set(ptr addrspace(200) %provenance, i64 %addr).
For us the only meaningful integer data is the “address” part of the pointer, i.e. i64 for 128-bit capabilities or i32 for 64-bit ones.

Similar to your pointer type, GEP only modifies the low bits and never affects the upper ones.

Also, just so I understand: y’all mentioned the validity bit above - what transformations are legal in the absence of the validity bit that become illegal if you have to preserve it?

I finally got around to updating the PR, so [DataLayout][LangRef] Split non-integral and unstable pointer properties by arichardson · Pull Request #105735 · llvm/llvm-project · GitHub now includes a separate “external state” flag using e in the data layout.

Here is the new LangRef wording for pointers with external state (intended to support CHERI):

A special case of non-integral pointers is ones that include external state
(such as implicit bounds information or a type tag) with a target-defined size.
An example of such a type is a CHERI capability, where there is an additional
validity bit that is part of all pointer-typed registers, but is located in
memory at an implementation-defined address separate from the pointer itself.
Another example would be a fat-pointer scheme where pointers remain plain
integers, but the associated bounds are stored in an out-of-band table.

When a store ptr addrspace(N) %p, ptr @dst of such a non-integral pointers
is performed, the external metadata is also stored to another location.
Similarly, a %val = load ptr addrspace(N), ptr @dst will fetch the
external metadata and make it available for all uses of %val.
Notionally, these additional bits are part of the pointer, but since loads and
stores only operate on the “inline” bits of the pointer and the additional
bits are not explicitly exposed, they are not included in the size specified in
the :ref:datalayout string<langref_datalayout>.

When a pointer type has external state, all roundtrips via memory must
be performed as loads and stores of the correct type since stores of other
types may not propagate the external data.
Therefore it is not legal to convert a load/store of a non-integral pointer
type with external state to a load/store of an integer type with same
bitwidth, as that drops the additional state.
However, it can be assumed that appropriately-aligned llvm.memcpy and
llvm.memmove intrinsics will preserve the external metadata.
This is essential to allow frontends to efficiently emit of copies of
structures containing such pointers, since expanding all these copies as
individual loads and stores would significantly inhibit optimizations.
To ensure that this results in valid code, affected backends must lower
these intrinsics in a way that propagates external state.

@krzysz00 this hopefully also addresses your question above (sorry I missed your reply!)
To clarify the CHERI behaviour here: the general purpose register hold the pointer address, bounds, and other metadata (2x the integer GPR size) as well as a validity tag. However, when storing these GPRs to memory you have to use the appropriate store (sc instead of two sd instructions for both parts) so that the CPU stores the validity bit to the cache/memory subsystem. When loading from memory you also have to use lc instead of ld to fetch all full 65 or 129 bits - any other load will set the register validity bit to zero.
The validity bit could be stored in ECC bits, an external carveout or some other implementation-defined way.
The only real implication for LLVM optimizations is that we need to preserve all load/store ptr addrspace(200) instead of rewriting them as integer operations.

Thank you everyone for the detailed feedback both on this thread and the pull request!

This change has now landed and I am planning to submit follow-up changes to avoid the use of DL.isNonIntegralPointerType(). This should allow for better optimizations for e.g. AMDGPU and also allow upstreaming of all the CHERI-specific changes to the IR passes.

1 Like