Proposal: impose guarantees on introduced inttoptr/ptrtoint pairs when pointers have index type < pointer size

So, currently, in LLVM data layouts, if we’ve declared pN:S:[alignment]:[alignment]:O where S is the pointer size and O is the offset size, you can either have address space N be integral (where, as I understand it, the pointer is assumed to address from [0, iS::umax]), or non-integral, which means that inttoptr and ptrtoint are not-deterministic.

For some cases - such as the buffer descriptors on AMD GPUs, which are, for these purposes, i80 metadata || i48 address = i128 or CHERI’s capability poniters (i64 tag || i64 address = i128), both of these semantics are not quite right. These types of pointers aren’t indexes into a flat area of memory, but they also aren’t the sort of wild, non-deterministic GC-managed things that non-integral pointers can be.

So, it seems to me (after some discussion with @jrtc27 ) that both of these semantics aren’t quite right for hardware fat pointers. Non-integral pointers go to far, and impose semantics that aren’t needed - like the fact that inttoptr may be non-deterministic (!). Those sorts of restrictions might make sense for things like garbage-collected pointers, but are too strong an assumption for a fat pointer.

Fat pointers do have an address component, and, as long as the compiler restricts itself to performing computations on the address compoment, fat pointers are just regular pointers.

So, I propose that, when an optimization pass inserts an ptrtoint/inttoptr pair, or otherwise starts modifying the bit value of a pointer, that transformation must not modify the high S - O bits of the integer value. That is, if you have p200:128:128:128:64, you could rewrite

%y = getelementptr i8, ptr addrspace(200) %x, i64 %idx

you could rewrite this to

%x.int= ptrtoint ptr addrspace(200) %x to i128
%metadata = and i128 %x.int, i128 0xffffffff_ffffffff_00000000_00000000 ; mask off address
%address = trunc i128 %x.int to i64
%address.y.trunc = add i64%address, i64 %x
%address.y = zext i64 %address.y.trunc to i128
%y.int = or i128 %metadata, %address.y
%y = inttoptr i128 %y.int to ptr addrspace(200)

but not to

%x.int = ptrtoint ptr addrspace(200) %x to i128
%idx.ext = zext i64 %idx to i128
%y.int = add i128 %x.int, %idx.ext
%y = inttoptr i128 %y.int to ptr addrspace(200)

because the latter could change the metadata bits

However, if the getelementpointer were inbounds the latter rewrite would be possible, because the inbounds tag (as far as I know) means that adding the pointer to the offset won’t produce a carry.

Note that, for typical pointers - where the offset size and the pointer size and the same, the and produces a 0, the truncations and extensions are noops, and so the or is also just the result of the addition, recovering the original transformation at no extra cost.

A downside of this approach is that it introduces a bunch of complexity to anyone wanting to do integer arithmetic on pointer values that they’ll need to keep track of and are likely to trip over if they’re not targeting a platform that has fat pointers.

One upside, though, is that many of the optimizations locked behind isNonIntegralAddressSpace() that are, with some care, applicable to fat pointers could be made applicable to such pointers, improving code generation and not saddling fat pointers with semantics that they don’t have.

On top of that, a quick skim of the isNonIntegralAddressSpace() calls lying around shows that most of them are used in contexts where the compiler wants to perform bitcasts and will not be doing any arithmetic on the pointer values, which is a case where fat pointers can be bitcast with no trouble. The more complicated “treat pointers as integers” sections, like the loop optimizer, could probably be gated behind getPointerSizeInBits(AS) != getIndexSizeInBits(AS) instead.

What do folks think? (also @arsenm since I’m rambling about AMD’s stuff and you might have thoughts)

i128 still can’t hold a capability for us. If you ptrtoint a CHERI capability you only get the address (possibly extended or truncated). Using an i128 with masking in your case is also kind of ugly, and you’d probably be better served by something that also works for CHERI. We have an @llvm.cheri.address.set (or similar) intrinsic that lets you glue an arbitrary address onto an existing valid capability, and one can also represent that as a GEP of the delta, but better generic support for this upstream would be good.

Ok, my bad on the width - how wide is a fully CHERI pointer (capability + address)?

Also regarding carrying bits, the address and metadata are separate fields in the hardware, so wrapping the address space in your pointer arithmetic does not carry out into the metadata. For us it really is like struct { word metadata; word address; } (ignoring the tag bit).

On a 64-bit architecture it’s 128+1 bits (128 bits of data, 1 bit of tag). But even an i256 cannot hold it, because it is not an integer, and if you convert to an integer^, that means putting it in an integer register, which means you are asking to strip the metadata. If you want to keep the metadata it must remain a pointer at the IR level.

^ (u)intptr_t is complicated, so don’t think about that… suffice to say at the IR level it’s a pointer not an integer

Really stupid question. You cannot wrap fat pointers into the modern opaque types:

Then only AMD can read and modify them.

Ok, so y’all do have pointer values that don’t meaningfully round-trip into integers and so have a meaningfully different usecase.

On our end, we’ve got instructions that take, broadly, a buffer descriptor

struct bufferDesc {
   addr : 48
   stride : 14
   swizzleFlags : 2
   extent : 32
   otherUsefulFlags : 32
}

and some combination of offsets.

(For extra fun, the buffer descriptor has to be in scalar registers, the offsets do not)

Currently, in the IR, we represent these structures as <4 x i32> and pass them to intrinsics that indicate a general memory read or write, which is not good for any sort of optimization.

I’m working on trying to get these things represented as pointers, which led me onto this here tangent. Since you very much can round trip a buffer descriptor through integer registers (but please don’t do arithmetic directly on that value without being careful), it seemed like some form of intermediately-restrictive pointer semantics would be useful in general.

But my proposal suffered somewhat from not knowing quite how CHERI worked at a hardware level, since y’all don’t have that “this round trips through an i129 (for us, i128) deterministically but please for the love of all that is holy don’t touch the value” semantic we do.

The problem is that we don’t want a fully opaque type - this type is a pointer and should be treated as such, usable as an argument to loads/stores/…, participate in alias analysis, and so on.

It does sound like something that works for CHERI would work for your use case though?

That is, is there a reason why you would want to introduce round-tripping through integers, beyond it being a potentially less invasive way to make your target work?

There isn’t really a strong reason to allow those round-trips (or at least, to allow round-trips the user didn’t introduce, since “I have a regular pointer, add metadata to it and use it as a buffer descriptor” is a perfectly sensible thing for code to do).

This is more growing out of last night’s discussions about the non-integral pointer guarantee appearing to be too strong for us and CHERI

But now with more details, it’s quite possible that non-integral pointer, despite the funky non-determinism introduced, isn’t too far off from “don’t touch the in-memory value of this pointer using non-pointer operations, that might not do what you think it does”.

Yeah, it’s not far off, but it is definitely overly strict, and would prevent constant-folding (ptraddr_t)p == (ptraddr_t)p to true (which can come up in highly macro-ised/templated/generated code), for example. I don’t want us to be using things where there are trivial valid optimisations being explicitly disabled, hence why we’re not using it and I would be against us adopting it in its current form.

(ptraddr_t is a new type we introduce that is an integer that can hold the integer address part of a pointer, ie the same as uintptr_t for non-CHERI, and in practice the same as size_t for CHERI)

Yep, makes sense.

So I’m not sure what better would look like either, but non-integral pointers are almost there.

Thinking out loud: Allowing pointers to have a metadata/capability/fat field that needs to be preserved when deriving pointers from other pointers might be the way to go - and then you’d want an operation that creates a pointer with a given address and metadata. And then instead of inttoptr on these things, you’d have a separate “extract address” operation … but that’d also involve tweaking a lot of the internal IR.

The way you can do it today is:

  1. pointer → address is a ptrtoint (though something non-capturing would be better for us; we have an intrinsic to do this)
  2. address + pointer-with-metadata → new pointer is a GEP of pointer-with-metadata and (address - ptrtoint pointer-with-metadata) (though this is a bit convoluted and we have an intrinsic to do this too)

Whereas the high-level operation I’m thinking of looks like ptrannotate, which will take (using our form) (ptr addrspace(0) %flat, i80 %flags) -> ptr addrspace(7) %desc, which I could write as

%addr = ptrtoint ptr addrspace(0) %flat to i64
%addr.trunc = trunc i64 %addr to i48
%addr.ext = zext i48 %addr.trunc to i128
%flags.ext = zext i80 %flags to i128
%flags.shift = shl i128 %flags.ext, i128 48
%desc.int = or i128 %addr.ext, %flags.shift
%desc = inttoptr i128 %desc.int to ptr addrspace(7)

and then ptrtoint %desc to i128 == %desc.int

Some of the code I feed to LLVM does basically this, but the end types are <4 x i32> and not ptr addrspace(7) - which is the problem I’m solving

And so the difference between CHERI’s model and what I’m planning is that I keep the metadata along for the ride in ptrtoint conversions.

Which isn’t something that we’d strictly need at the IR level - after all, if a buffer descriptor was, for IR purposes, a (i80, i48) - or a CHERI tagged pointer were a (i64, i1, i64) - it’d work fine, so long as both those representations could be written into the right instructions.

That is, my ptrannotate could be spelled with ptrtoint on the address space 0 pointer (+ truncation) followed by some flavor of structtoptr.

But allowing struct-typed pointers would be a whole thing that I don’t want to take on right now.

Well they’re not actually structs, and if you try to split them up you’ll lose the tag (but merely querying properties is totally fine, and normal). They’re a single entity that must be preserved as is if you want them to remain valid; the struct analogy is only to explain what comprises the raw bits that they’re made out of. From an IR perspective it’s a ptr addrspace(200), never anything else. So (i64, i1, i64) would in fact not work fine as you can’t split them up and later join them back up, nor can you arbitrarily manipulate anything other than the address. We don’t want some kind of pointers-as-structs thing.