I’ve recently encountered an issue where the `instcombine` pass replaces an `llvm.memcpy` between two distinct address spaces with an `addrspacecast` instruction.
As an example, see the trivial OpenCL kernel attached. I’m compiling like this:
clang -cc1 -triple spir64-unknown-unknown -x cl -O0 -emit-llvm array_init.cl -o before.ll
This yields an `llvm.memcpy` to copy the array initialiser data from the global variable (in `addrspace(2)`) to the `alloca` result (in `addrspace(0)`).
I then apply the `instcombine` pass via:
opt -S -instcombine before.ll -o after.ll
This results in the memcpy being nuked, and the `addrspace(2)` data is now accessed directly via an `addrspacecast` to `addrspace(0)`.
It seems to me that this sort of optimisation is only valid if it is guaranteed that the two address spaces alias for the given target triple (which for SPIR, they do not).
This particular optimisation is coming from lines ~290-300 of InstCombineLoadStoreAlloca.cpp, although I suspect this isn’t the only case where this might happen.
Adding a check to only perform this replacement if the two address spaces are equal fixes the issue for me, but this is probably too conservative since many targets with flat address spaces will probably benefit from this optimisation. It feels like passes should query the target about whether two address spaces alias before introducing an `addrspacecast`, but I’m not familiar enough with LLVM internals to know if this is information that is easy to make available (if it isn’t already).
Is there something we can do here to avoid this sort of optimisation causing problems for targets with segmented address spaces?
array_init.cl (76 Bytes)
before.ll (1.99 KB)
after.ll (1.61 KB)