Clarifiying the semantics of ptrtoint

Hi @nikic, thanks for weighing in on this discussion.

I agree that having another different value could be confusing – and for the CHERI case the index width is exactly what we want, so just using that sounds good to me. The only reason I considered a new component is that previously concerns were raised that we can’t assume index size is the same as the address size (e.g. amdgpu). However, the discussion so far seem sto indicate the AMDGPU use cases don’t really need the “48bit base+offset” in most cases anyway since the base is generally sufficiently aligned to perform alignment checks just on the offset.

For ptrsub I am not sure if we should return poison for mismatched provenance, but I think that is a separate discussion we should have in another thread to avoid extending this one.

So in conclusion, are we happy with the following resolution for ptrtoint/ptrtoaddr? If so I will go ahead and update PRs and start using the address width/index width for e.g. knownbits of pointers instead of the full representation width.

  1. ptrtoint behaves as a bitcast of the full representation width and has capturing semantics.
  2. We clarify that non-integral pointers can have an address component that is smaller than the sie of the pointer. This component must be the same as the index width and LLVM can assume that it is the low bits of the pointer. For AMDGPU fat pointers we don’t actually return the underlying address since that would require arithmetic but instead the offset relative to the base.
  3. I update [DataLayout] Introduce DataLayout::getPointerAddressSize(AS) by arichardson · Pull Request #137412 · llvm/llvm-project · GitHub to return the index width for DL.getPointerAddressSize() instead of adding a new component. I believe having these new accessor functions is still useful even if they return the same value since it’s not obvious that index==address size. This would also make it easier to allow future changes for address width != index width if we ever decide this is needed.
  4. We introduce a new ptrtoaddr instruction that returns the address component of the pointer:
  • This behaves similarly to ptrtoint but does not expose/capture provenance.
  • It always returns the low index with bits of the pointers, i.e. ptrtoaddr %x == trunc iIndexWidth (ptrtoint %x) but without the ptrtoint side-effects
  1. (later/in parallel) We introduce a new ptrsub instruction that subtracts the address bits (i.e. ignoring anything beyond index width). Exact semantics TBD.