However, if your pointer can effective address N bits of memory using
N bits of data, then we make the entirely reasonable assumption that
|ptrtoint == ptrtoaddr|.
I originally had more quibbles about more of your reply, but on
reflection, I think this one bit really cuts to the heart of the matter.
To my eyes, we have the following key tenants:
ptrtoint is “just” a bitcast with provenance-esque side effects (but
I’m ignoring provenance here).
ptrtoaddr is the inverse of a gep, in the sense that ptrtoaddr(gep p, X) - ptrtoaddr(gep p, Y) == X - Y. Hypothetically, we could
introduce a notion of ptrbase(p) that returns the ‘0’ element, and if
we did so, we would have p == gep ptrbase(p), ptrtoaddr(p). (Note also
that we can define ptrbase(p) := gep p, -ptrtoaddr(p) as a result).
If you want to convert a pointer to an integer solely for the purpose
of index calculations, then you should use ptrtoaddr and not ptrtoint. For example, computing a pointer diff should be ptrtoaddr(p) - ptrtoaddr(q), and aligning a pointer should be gep p, -(ptrtoaddr p & align).
Hardware pointers seem to be universally built out of “base address
properties” + “offset” parts, and the index width of a pointer is going
to (usually) be the size of the offset part. Huge pointers are the only
exception to this rule, I think.
Given the above, the correct size for a ptroaddr actually should
be the index width of a pointer: we don’t want InstCombine to create any
intermediate extension/truncates in these kinds of calculations. If that
is true, the only utility of the effective address size/offset size is
about expressing when ptrtoint == ptrtoaddr holds.
I think we’re in agreement on the intended semantics where index size ==
offset size == address size, it’s the case where index size > offset
size that we differ. And I think the only example brought up with this
scenario is the huge pointer case?
I … hadn’t considered “ptrtoaddr is the inverse of GEP” as a perspective.
I was thinking of it as the get effective address function - given a pointer, where does it go, really? That’s not necessarily an invertable operation for annotated pointers - for example, if I’ve got a base + offset kind of pointer, ptrtoaddr will combine the base and the offset in a way that you can’t reliably pull apart. That’s also why I was thinking that garbage-collected pointers have ptrtoaddr undefined - you’re not allowed to ask what their effective address is, you can only look at the offset they’re carrying along.
Or at least that’s all true the way I’ve been thinking of it.
I think your notion of ptrtoaddr - tell me the offset of this pointer from whatever it considers its 0 point - which doesn’t have to be null isboth probably fine and I figure we should go with it.
I think your alignment logic won’t quite work on an AMD ptr addrspace(7) in its full generality though.
That is, suppose I have %p = ptr addrspace(7) {base address = i48 / [global pointer] 33, num_records = ..., offset = i32 2}. The actual effective address of this pointer is 33 + zext(i32 2 to i48) = 35. With my ptrtoaddr, ptrtoaddr(%p) == i48 35 With yours, ptrtoaddr(%p) == i32 2. With your math, aligning %p to 4 bytes would clear the offset field - which is all a GEP can touch - leaving it still, ultimately, unaligned.
But … then again, there’re no hardware alignment requirements running around in the back for these guys - or at least I can’t remember any - so it’s probably fine.
In short, I think I’m happy with your ptrtoaddr - that is:
All pointers have a notional “base pointer” such that %p = getelementptr i8, ptrbase(%p), i[Index width] %x for some %x
ptrtoaddr returns the %x in that expression
Pointer arithmetic on pointers that have a common ptrbase is arithmetic on the ptrtoaddrs . If the ptrbases don’t match, the backend can fill in the blacks in whichever way makes sense.
If the index width is the same as the pointer width, ptrbase(%p) == nullptr for all %p.
Sorry about the long silence here and thanks so much for the writeup @krzysz00 .
My view of ptrtoaddr is that it should return a linear address relative to some defined start of the address space (e.g. relative to address zero for traditional AS0). This works both for current systems, CHERI and also the AMDGPU 32-bit stack address (which is a linear address relative to some well-defined base and happens to be at some fixed offset to the full 64-bit address space).
While it works in the CHERI case, I am not particularly comfortable with the idea that ptrtoaddr width must be the same as the index width. This would means in the AMDGPU case of 48-bit address with 32-bit index we can only have ptrtoaddr return 32-bit addresses which are not able to represent all things, or we effectively turn it into ptrtooffset (which I think would be the useful operation of GC pointers). In the CHERI world we actually added an intrinsic @llvm.cheri.cap.offset.get that gives you this offset relative to the base and we experimented with a compiler mode that always returned offset relative to base whenever you cast a C pointer to an integer, but it turned out this caused a lot of compatibility issues with C code.
In summary, I hope the following part of @krzysz00’s proposal should be uncontroversial:
ptrtoint is a bitcast with escaping side-effects (and inttoptr is the inverse, but may not be supported if you have out-of-band state)
ptrtoaddr gives a integer address in the current address space that may be smaller than the raw bitwise representation. Provenance does not escape but the address does.
if address width == index width == pointer width then the only difference between ptrtoint and ptrtoaddr is the provenance escape.
For GC pointers, I believe having ptrtoaddr return an offset relative to the object base is confusing and probably not that useful compared to not allowing it. We already have i64 @llvm.experimental.gc.get.pointer.offset to obtain the offset of GC pointers.
More to follow, but I was basically thinking of ptrtoaddr as what I think you’re calling ptrtooffset - that is, this is an operation that returns an unsigned distance to some “leftmost” pointer (allowing for unsigned pointer add).
So for normal address spaces, this “leftmost” pointer is the one with bit pattern 0. The way I imagined this is that AMD address space 7 is somewhat GC-like in that the “leftmost” value for a particular {resource, offset} pair is offset = 0. This preserves the fact that no sequence of GEP/ptradd operations will get you from one buffer resource to another (though you might point at the same memory if they overlap).
In the CHERI sense, if I understand correctly, the “leftmost” value of a pointer with some provenance tags P is the 0 address tagged with provenance P. No sequence of GEPs will get you to a pointer from one with a different set of tags.
To poke at your proposal a bit, what relations do you have in mind for ptrtoaddr equality and pointer equality? (My proposal is that things that don’t share a “leftmost” pointer have target-dependent comparison semantics, so that both the current AMDGPU “p and q are equal if their resource and offset match” and CHERI’s “p and q are equal if their addresses match, ignoring provenance” are legal, but neither is required.)
In my view CHERI address behave exactly the same as addresses in the “normal” address spaces: bit pattern zero is NULL, all other addresses are linear from the start.
The “leftmost” valid value of a pointer is has address==base. However, you could decrement to go out-of-bounds to the left (potentially all the way to zero). In those cases ptrtoaddr will return a value outside of the bounds of the pointer, but you can never dereference it. The hypothetical ptrtooffset would return a negative value. GEPs can only affect the address part of the pointer, the “provenance” part is unaffected by GEPs[1] Therefore you can never use GEP to traverse across distinct objects.
I believe this behaviour is similar/identical to the AMDGPU one where you have a base+size with an offset?
We encountered quite a few problems with using “offset from object base” vs “offset from start of address space” for pointer->integer addr conversions with the main one being alignment checks: if you return the offset, all pointers to the start of an object appear to have infinite alignment since you always get 0 back, but realistically that is not true. We initially worked around this by manually fixing alignment checks, but that was not feasible for wider deployment.
I believe for LLVM semantics the alignment of the ptrtoaddr result should match that of the actual hardware location[2] since otherwise the constraints such as gep p, -(ptrtoaddr p & align) for alignment would break.
In terms of equality for CHERI, ptrtoaddr(a) == ptrtoaddr(b) implies that both point to the same underlying address and either have overlapping/equal bounds or one of them is a pointer one past the end and the other is a pointer to the start of another object.
Performing a full bitwise equality by default for icmp eq ptr probably makes the most sense, the only reason we don’t do that right now is due to existing C code breaking and LLVM optimizations working better if you use icmp eq ptr. I think the solution for this would be to simply have the frontend emit icmp eq i64 ptrtoaddr(a), ptrtoaddr(b) to obtain those semantics and add support for any missing optimizations for this pattern.
Full bitwise comparisons can be awkward for compatibility once you use sub-object bounds: e.g. for struct { int a; int b;} foo what should we do for (void*)&foo == (void*)&foo.a? With subobject bounds, taking a pointer to foo.awill have bounds of 4 bytes with the address being the same as&foo` (which has 8-byte bounds). Since the metadata is now different should these pointers be considered equal or different even though they both point to the same address and are dereferenceable (not past-the-end pointers)?
Strictly speaking this is not 100% true: if you go far enough out of bounds the hidden validity bit will be cleared and the pointer becomes non-dereferenceable. But the visible 128-bit representation cannot be changed by GEPs. ↩︎
At least for the lower bits, once you have page-based virtual memory the underlying physical address could have different alignment above the page size. ↩︎
To be clear, when I say “leftmost” pointer, I don’t mean leftmost valid pointer. For all the CHERI pointers - representing them as ptr %p = {tag %t, addr %a}, the “leftmost” pointer for %p is {%t, 0}.
And re bitwise comparisons, my claim is that the equality of (void*)&foo == {t1, addr} and (void*)&foo == {t2, addr} is platform-defined. Because you can’t use GEPs to get from {t2, addr} to {t1, addr} or vice-versa, you’re allowed to fill in the definition of == in any way that makes sense for your target’s pointers. For example, with AMD addrspace(7) pointers, we’re currently defining two pointers as definitionally unequal if they come from different resources, even if they point to the same location (though I don’t think anyone’s relying on that behavior and we could change it - even though it’d be more expensive), but that approach doesn’t work for CHERI
In other words, ptrtoaddr returns just the bits of the pointer that’re effected by a GEP - that is, index-width worth of bits. If the bits in a pointer that’re outside that range are equal, then pointer equality is ptrtoaddr equality. However, if they’re not equal, the behavior of pointer equality is whatever your target’s manual says it is.
Do these seem like problematically loose semantics? It’s possible I might be way off base here with what I think ptrtoaddr should be
Thanks, I wasn’t quite sure if you meant leftmost relative to the object itself or the address space.
For CHERI pointers which are {out-of-band bool %valid, iN %metadata, iN %addr} (with N=32/64), so I agree that the leftmost pointer in the address space has address 0 and is %leftmost = {%valid, %metadata, 0} with any value for metadata+valid.
I think for LLVM IR semantics it would be cleanest if we could say that equality is only true if all components of the pointer are the same. The difficult part is pointers with overlapping bounds where you can GEP from one address to another even though one may grant access to a larger region. IMO the C frontend can lower things differently to use address comparison so that bounded pointers that have overlapping bounds compare equal, but at the IR level we should always do a full comparison.
So two pointers with the same address but different bounds or permissions (for CHERI) or resource type (AMDGPU) should compare different.
Just returning the index bits for ptrtoaddr seems slightly problematic to me, but it’s possible that I don’t understand the AMDGPU fat pointers enough.
Do I understand correctly that for a given resource you have some metadata identifying the type of resource (which I imagine implies the valid memory access size) plus a 48 bit address plus a 32-bit offset?
Does every pointer to the start of a resource have offset 0? In that case it sounds like you can’t use ptrtoaddr to perform checks for alignment.
Yeah, every pointer to the start of a resource has offset 0, and so I see your concern about not being able to use ptrtoaddr for alignment checks is correct if we go with my notion of ptrtoaddr.
So if we want ptrtoaddr to basically be the same as “the integer you’d get if you got a metadataless pointer to the same address and ptrtoint’d it” … yeah, we’d need 48 bits for the amdgpu buffer fat pointers.
My overall concern with introducing a separate “address width” is the combination of:
Based on our experience with pointer index widths, I believe introducing one more pointer width is going to place a large additional burden on correct handling of pointers. The current situation where we have two widths already requires significant care and additional testing effort, and we have only recently (years after the introduction of separate index widths) reached a mostly functional state. This is a burden every LLVM developer will have to bear.
At the same time, reading this thread, I feel like the separate address width is mostly a solution in search of a problem. Yes, we have AMDGPU fat pointers, where we could imagine specifying a 48 bit address width with a 32 bit index width, but I didn’t get the impression that doing this is addressing a problem someone has right now, it’s more of a hypothetical design improvement.
An additional complication is that, in the cases where having an address width != index width could make sense, the “effective address” has be derived from the pointer through a complex calculation, it’s not just taking the low bits as is the case with the index width. Allowing a non-trivial operation here will complicate analysis (e.g. we can’t transfer known bits across ptrtoaddr).
To add one more consideration into the mix, I’m interested in introducing a ptrsub instruction to compute the difference between two pointers, and of course the pointer vs address vs index size question comes up there as well. My assumption was that ptrsub is going to return the index size (either by truncating or making top bit mismatch poison), which means that it will complement nicely with ptradd (aka getelementptr), as they both operate on the index width. ptrsub(ptradd(p, x), p) == x and ptradd(p, ptrsub(p2, p)) == p2 (assuming p and p2 have same provenance) will hold.
You could of course make ptrsub return the address width instead. For the AMDGPU fat pointer case, this would enable ptrsub to return a sensible result if you subtract two pointers that have two different base addresses but still refer to the same object. But now you have to deal with address vs index mismatch everywhere. This seems like something that just should not be supported for these pointer types, because it results in an incoherent overall model (as subtracting and then adding back a pointer will not get you back to the original value anymore).
I also agree that we really don’t need “address width”, especially since, as you said, the effective address of a ptr addrspace(7) can actually be non-trivial work. (And anyone wanting it should just addrspacecast back to ptr addrspace(0) or ptr addrspace(1) … once I implement that)
You have a very good point about ptrsub - it doesn’t really make a lot of sense between two distinct buffers, especially given the out of bounds behavior of buffer resources.
~~My gut instinct, actually, is that if you have p7 %p = {p8 %r1, i32 o1} and p7 %q = {p8 %r2, i32 o2}, then the definition of ptrsub(%p1, %p2) is (%r1 == %r2) ? %o2 - %o1 : i32 poison
(In practice that poison would get resolved to o2 - o1 anyway, but, for conceptual purposes, subtracting disjoint pointers is poison-y. If actual poison is a bit strong here, then “arbitrary consistent behavior” works)~~
Edit: A not on “top bits of ptrsub arguments mismatch is poison” … CHERI wouldn’t like that because, for example, the pointers to an object and its first subject have different bounds bits but can be ptrsub’d just fine. So … on further reading, the truncation definition of ptrsub is what we’d want to go with.
So given a ptrsub that truncates, the definition of ptrtoaddr is ptrtoaddr p ::= ptrsub p, (inttoptr 0)
I’m not 100% on defining ptrtoaddr %p to be trunc i[index] (ptrtoint %p)) because I don’t want to categorically forbid some sort of exotic pointer format where the low bits are metadata or something, but that may be a theoretical concern. I figure we at the very least have the laws (for non-GC pointers)
ptrtoaddr (ptradd p, x)) = add (ptrtoaddr p), x
p == q => ptrtoaddr x == ptrtoaddr q
sub(ptrtoaddr p, ptrtoaddr q) == ptrsub(p, q)
and I’m probably missing some. This might also be a search for a problem and so we could just go with “the index bits of the pointer”
Hi @nikic, thanks for weighing in on this discussion.
I agree that having another different value could be confusing – and for the CHERI case the index width is exactly what we want, so just using that sounds good to me. The only reason I considered a new component is that previously concerns were raised that we can’t assume index size is the same as the address size (e.g. amdgpu). However, the discussion so far seem sto indicate the AMDGPU use cases don’t really need the “48bit base+offset” in most cases anyway since the base is generally sufficiently aligned to perform alignment checks just on the offset.
For ptrsub I am not sure if we should return poison for mismatched provenance, but I think that is a separate discussion we should have in another thread to avoid extending this one.
So in conclusion, are we happy with the following resolution for ptrtoint/ptrtoaddr? If so I will go ahead and update PRs and start using the address width/index width for e.g. knownbits of pointers instead of the full representation width.
ptrtoint behaves as a bitcast of the full representation width and has capturing semantics.
We clarify that non-integral pointers can have an address component that is smaller than the sie of the pointer. This component must be the same as the index width and LLVM can assume that it is the low bits of the pointer. For AMDGPU fat pointers we don’t actually return the underlying address since that would require arithmetic but instead the offset relative to the base.
We introduce a new ptrtoaddr instruction that returns the address component of the pointer:
This behaves similarly to ptrtoint but does not expose/capture provenance.
It always returns the low index with bits of the pointers, i.e. ptrtoaddr %x == trunc iIndexWidth (ptrtoint %x) but without the ptrtoint side-effects
(later/in parallel) We introduce a new ptrsub instruction that subtracts the address bits (i.e. ignoring anything beyond index width). Exact semantics TBD.
Agreed that this is a good resolution and addresses all the concerns that came up. I’d be willing to call that consensus if we don’t get anyone charging in to complain at the last minute
I’m not sure we should really introduce getPointerAddressSize() though – my thinking here is that this may be more confusing than helpful in practice, especially in contexts where you’re working both with address size (ptrtoaddr) and index size (GEP index) values and trying to combine them. Having separate APIs comes with the implication that they may be returning different values and you need to write code to accommodate the difference (which we wouldn’t want for now).
I briefly considered whether this instruction should be called ptrtoindex instead of ptrtoaddr to have fully consistent terminology, but I think this name would be quite confusing for anyone not very familiar with our pointer modelling, and being more pedantically correct is not worth it here.
Thanks, that sounds great - I’ll try to work on initial patches for ptrtoaddr soon.
The reason I would like to add it is that the name index suggests that this is only correct for indexing operations (i.e. GEP) and not related to the pointer size. For folks not familiar with non-integral pointers who want to convert a pointer to an address, having a explicit API might make it more obvious that this one should be used. Otherwise it would be natural to assume that getPointerSize() is the correct API to get the address portion of a pointer (since in the “classic pointer” case those are identical).
If you feel strongly that adding another API is confusing, I could make it very clear in the doc comments on getPointerSize() and getIndexSize() that the “integer range” of a pointer is only the index (in addition to adding wording to LangRef).
If you are curious for the semantics we picked here in Rust, check out these docs. I hope the LLVM intrinsic can be defined in a way that we can use it from Rust.