Pointer-typed globals in larger-than-pointer integer containers fails

I’ve run across what might or might not be a bug with generating code for the wasm32 target.

I have a Java-to-native-image project which has an in-development backend to target wasm. In the JDK, there are several classes which use the Java type long to hold pointer values. In Java, a long is defined as a signed 64-bit twos-complement integer value, so we map this to i64 as one might expect. Since part of the image is the initial heap, this means that we have some pointers to locations in the initial image encoded into 64-bit integers.

However, the wasm32 backend seems to fail on this. Consider the following source code:

; this is the thing we're taking the address of
@foo = global i32 0

; works:
@bar = global i32 ptrtoint (ptr @foo to i32)
; fails:
; @baz = global i64 zext (i32 ptrtoint (ptr @foo to i32) to i64)

; works:
@zap = global i64 zext (i32 123 to i64)

If you uncomment @baz, compilation fails, even though the @bar and @zap expressions seem to work fine. The error looks like this:

LLVM ERROR: Unsupported expression in static initializer: zext (i32 ptrtoint (ptr @foo to i32) to i64)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-trunk/bin/llc -o /app/output.s -x86-asm-syntax=intel --mtriple=wasm32-wasi-unknown <source>
 #0 0x0000561e544b714f llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x335314f)
 #1 0x0000561e544b4bc4 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f6e1fc81420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #3 0x00007f6e1f74e00b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b)
 #4 0x00007f6e1f72d859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22859)
 #5 0x0000561e51a85272 llvm::UniqueStringSaver::save(llvm::StringRef) (.cold) StringSaver.cpp:0:0
 #6 0x0000561e534dd465 llvm::AsmPrinter::lowerConstant(llvm::Constant const*) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2379465)
 #7 0x0000561e534e2587 emitGlobalConstantImpl(llvm::DataLayout const&, llvm::Constant const*, llvm::AsmPrinter&, llvm::Constant const*, unsigned long, llvm::DenseMap<unsigned long, llvm::SmallVector<llvm::GlobalAlias const*, 1u>, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::SmallVector<llvm::GlobalAlias const*, 1u>>>*) AsmPrinter.cpp:0:0
 #8 0x0000561e534e2d86 llvm::AsmPrinter::emitGlobalConstant(llvm::DataLayout const&, llvm::Constant const*, llvm::DenseMap<unsigned long, llvm::SmallVector<llvm::GlobalAlias const*, 1u>, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::SmallVector<llvm::GlobalAlias const*, 1u>>>*) (.constprop.0) AsmPrinter.cpp:0:0
 #9 0x0000561e534e3a0b llvm::AsmPrinter::emitGlobalVariable(llvm::GlobalVariable const*) (/opt/compiler-explorer/clang-trunk/bin/llc+0x237fa0b)
#10 0x0000561e52bafb8b llvm::WebAssemblyAsmPrinter::emitGlobalVariable(llvm::GlobalVariable const*) (/opt/compiler-explorer/clang-trunk/bin/llc+0x1a4bb8b)
#11 0x0000561e534dfdf9 llvm::AsmPrinter::doFinalization(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x237bdf9)
#12 0x0000561e53c236c5 llvm::FPPassManager::doFinalization(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2abf6c5)
#13 0x0000561e53c2feb8 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2acbeb8)
#14 0x0000561e51b562a3 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#15 0x0000561e51a91a2a main (/opt/compiler-explorer/clang-trunk/bin/llc+0x92da2a)
#16 0x00007f6e1f72f083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#17 0x0000561e51b4defe _start (/opt/compiler-explorer/clang-trunk/bin/llc+0x9e9efe)
Compiler returned: 139

This error is somewhat more dramatic than a normal compilation error leading me to believe that this restriction might not be intentional. So, my question is, is this an intentional failure imposed by some kind of spec restriction, or is it a bug?

I’ve made a godbolt.org example which can be tinkered with for more information.

Arbitrary expressions involving ptrtoint are impossible to lower. This particular case is probably feasible, but not really a priority to implement.

You should be able to emit something like @split = global {i32, i32} { i32 ptrtoint (ptr @foo to i32), i32 0}

I would have to rewrite my compiler’s heap serializer so that it changes the serialized type of each object so that any 64-bit integer fields which happen to contain pointers in that particular instance are transformed into a properly 64-bit aligned pair of 32-bit fields with one being zero and one being the pointer type and value, depending on the endianness of the target platform. (But only when the target is 32-bit…)

I’m not looking for arbitrary expression support; I’m just looking for the ability to store a pointer into a larger-than-pointer integer type, which would make life much easier for me. Is this in any way feasible or am I just stuck rewriting our heap serializer?

C doesn’t let you write something that would produce that, and no backend I know of supports it.

I’ve opened an issue #60453 to request this as an enhancement to 32-bit backends (because despite C’s inability to do this, it is really pretty important for Java IMO).

@JohnReagan don’t you have mixed 32-bit and 64-bit pointers for OpenVMS? Does this problem sound at all familiar?

Architecturally, I think this is the right thing to do. What global initializers really are is a “bag of bytes with relocations”, and always representing them as [N x i8] as the base type with interleaved relocatable ptr expressions will save you a lot of headaches. This is what rustc does.