Question on equivalence of pointer types

Is copy.0 semantically equivalent to copy.1 in the following example?

define void @copy.0(i8 addrspace(1)* addrspace(1)* %src, i8
addrspace(1)* addrspace(1)* %dst) {
entry:
  %val = load i8 addrspace(1)* addrspace(1)* %src
  store i8 addrspace(1)* %val, i8 addrspace(1)* addrspace(1)* %dst
  ret void
}

define void @copy.1(i8 addrspace(1)* addrspace(1)* %src, i8
addrspace(1)* addrspace(1)* %dst) {
entry:
  %src.cast = bitcast i8 addrspace(1)* addrspace(1)* %src to i8* addrspace(1)*
  %dst.cast = bitcast i8 addrspace(1)* addrspace(1)* %dst to i8* addrspace(1)*

  %val = load i8* addrspace(1)* %src.cast
  store i8* %val, i8* addrspace(1)* %dst.cast
  ret void
}

-- Sanjoy

Partially answering my own question, in general these are not
equivalent because LLVM allows for pointers in different address
spaces to have different sizes. However, are they equivalent if
pointers in addrspace(1) have the same size as pointers in
addrspace(0)?

In other words, assuming pointers have the same size irrespective of
address spaces, is storing / loading an (not storing into / loading
from) addrspace(1)* allowed to do something semantically different
than storing / loading an addrspace(0)*?

Thanks,
-- Sanjoy

Partially answering my own question, in general these are not
equivalent because LLVM allows for pointers in different address
spaces to have different sizes. However, are they equivalent if
pointers in addrspace(1) have the same size as pointers in
addrspace(0)?

In other words, assuming pointers have the same size irrespective of
address spaces, is storing / loading an (not storing into / loading
from) addrspace(1)* allowed to do something semantically different
than storing / loading an addrspace(0)*?

It is yeah. For example, this code is in InstCombine where we assume that loading null is undefined, but only for addrspace(0). Whether other address spaces trap or give undefined behaviour on loading null is target dependent.

Thanks,
Pete

// load(gep null, …) → unreachable
if (GetElementPtrInst *GEPI = dyn_cast(Op)) {
const Value *GEPI0 = GEPI->getOperand(0);
// TODO: Consider a target hook for valid address spaces for this xform.
if (isa(GEPI0) && GEPI->getPointerAddressSpace() == 0){
// Insert a new store to null instruction before the load to indicate
// that this code is not reachable. We do this instead of inserting
// an unreachable instruction directly because we cannot modify the
// CFG.
new StoreInst(UndefValue::get(LI.getType()),
Constant::getNullValue(Op->getType()), &LI);
return ReplaceInstUsesWith(LI, UndefValue::get(LI.getType()));
}
}

I think so. My understanding is that address spaces can model things
like segmented memory or address regions that need special
instructions.

In the example that you gave, the difference is between

  load i32 addrspace(0)* null
and
  load i32 addrspace(1)* null

(that the first one is UB, while the second one may be well-defined).

But my question is different, it is that are the following two different:

  load i32 addrspace(0)* addrspace(0)* %p0
  load i32 addrspace(1)* addrspace(0)* %p1

if %p0 == %p1 (because they're bitcasts of each other or something --
note that you don't need an addrspacecast to go from "i32
addrspace(1)* addrspace(0)*" to "i32 addrspace(0)* addrspace(0)*").

They are different in the type system because one of them produces an
"i32 addrspace(0)*" while the other produces an "i32 addrspace(1)*",
so in general they cannot be substituted by each other, but there are
edge cases as in the example I started this thread with.

In other words: the semantics of a load or store depend on the address
space of the pointer operand. But in case the *value* we're storing
is also a pointer, does the semantics depend on the value's address
space to?

I suspect the answer is no, but I may be missing something here and
wish to confirm.

-- Sanjoy

To give a concrete example, on our architecture we're using one address space for 64-bit pointers that are relative to a global capability register[1] and another address space for 256-bit capabilities. You can bitcast a pointer to a pointer to a pointer to a capability (or vice versa) and the value that you load may or may not be meaningful (it probably won't be). The instructions emitted will be different, because one is loading a 64-bit integer into an integer value, the other is loading a 256-bit capability into a capability register (and will trap if the value isn't 256-bit aligned).

In your original examples, on our architecture, one would copy 64 bits, the other would copy 256 bits (and preserve the tag bit). Replacing one with the other would be a very bad idea.

We've had to fix a few things in mid-level optimisers to deal with this, but not many. Mostly we've had to make changes to SCEV and the vectoriser (and lots to SelectionDAG) to understand that pointers are not always integers and a few to CodeGen to understand that you can't replace the LLVM memcpy intrinsic that's copying from one address space to another with a call to the memcpy library routine (actually, we fudge this with a custom lowering in the IR).

David

[1] You can think of memory capabilities as being something like segments and something like fat pointers, depending on how you use them. If you really want to know more: http://chericpu.org

Hi David,

It seems that your use-case is similar to mine (though
http://chericpu.org does not resolve for me :frowning: ).

In your original examples, on our architecture, one would copy 64
bits, the other would copy 256 bits (and preserve the tag bit).
Replacing one with the other would be a very bad idea.

Mostly we've had to make changes to SCEV and the
vectoriser (and lots to SelectionDAG) to understand that pointers are
not always integers

I take it this means that casting a capability to and from an i256
(via ptrtoint and inttoptr) is not semantically a no-op?

In our use case pointers in addrspace(0) and addrspace(1) are both 64
bits wide; but like in your case, storing (loading) an addrspace(0)
pointer is semantically a different operation than storing (loading) a
addrspace(1) pointer. I wonder if withholding pointer sizes from
target-independent optimizations will give us the strong distinction
we need, because llvm has to assume that pointers of different address
spaces are of different sizes in general.

-- Sanjoy