Hi
For .LL like:
define void @Func()
{
%var1 = alloca double
store double 0x40bb580000000000, double* %var1
ret void
}
ppc32 output is:
...
lis 3, 16571
ori 3, 3, 22528
li 4, 0
stw 3, 8(1)
stw 4, 12(1)
...
I'm using the PPC backend's output as the "bytecode" for an interpreter
that I would like to be able to run on both little- and big-endian
platforms. The split stw's mean that i32s of the f64 are swapped in
memory on little-endian (thus foiling native-code interop).
Can anyone suggest where I should look to lower this store differently?
(ideally by libcall-ing a function that takes 2 i32s).
thanks,
scott
It's fundamentally impossible to correctly interpret using the wrong
endianness, at least for general C code. Have you considered making
your interpreter map memory backwards on opposite-endian platforms?
-Eli
Hi
The front-end is not C code, and doesn't do bit-level operations (if
that's what you mean). My first attempt was indeed to just leave
memory in big endian and swap during loads/stores. But, I'd like to
support an FFI to native code (which of course doesn't have any
knowledge of having to swap memory), so that isn't too practical.
thanks,
scott
I'm using the PPC backend's output as the "bytecode" for an interpreter
that I would like to be able to run on both little- and big-endian
platforms. The split stw's mean that i32s of the f64 are swapped in
memory on little-endian (thus foiling native-code interop).
It's fundamentally impossible to correctly interpret using the wrong
endianness, at least for general C code. Have you considered making
your interpreter map memory backwards on opposite-endian platforms?
Hi
The front-end is not C code, and doesn't do bit-level operations (if
that's what you mean).
Sort of, yeah... constructs like union {double a; unsigned b[2];} in C
depend on the endianness.
But, I'd like to
support an FFI to native code (which of course doesn't have any
knowledge of having to swap memory), so that isn't too practical.
You can't do FFI with incompatible ABIs in the general case without
copying the data into structures appropriate for the target. Structs
are laid out differently on different architectures, and sometimes the
incompatibility is non-obvious. But I'll assume you're doing some
simple case, like the arguments being only scalars and pointers.
Back to your issue, the simplest workaround for particular issue of
splitting stores is to mark all your stores volatile; that will force
the store to be done in a single instruction (the price is that it
will reduce the effectiveness of some optimizations, but it shouldn't
have a huge impact if you restrict it to the arrays you're going to
pass via FFI). If you prefer to hack your version of LLVM, you could
alternatively disable the relevant optimization... it's probably in
DAGCombine or PPCISelLowering, although I haven't checked.
-Eli
You can't do FFI with incompatible ABIs in the general case without
copying the data into structures appropriate for the target. Structs
are laid out differently on different architectures, and sometimes the
incompatibility is non-obvious. But I'll assume you're doing some
simple case, like the arguments being only scalars and pointers.
Back to your issue, the simplest workaround for particular issue of
splitting stores is to mark all your stores volatile; that will force
the store to be done in a single instruction (the price is that it
will reduce the effectiveness of some optimizations, but it shouldn't
have a huge impact if you restrict it to the arrays you're going to
pass via FFI). If you prefer to hack your version of LLVM, you could
alternatively disable the relevant optimization... it's probably in
DAGCombine or PPCISelLowering, although I haven't checked.
Hi
Thanks for your help. After putzing around in DAGCombine and the PPC
lowering code for a while, I decided it wasn't going to work too well.
For anyone interested, I ended up creating an additional PPC cpu
target that's little-endian. This doesn't seem to require many
changes, just little tweaks in the PPC code which (reasonably) assumes
its memory is big-endian (fctiwz, for example).
Of course, the compilation speed halves because I need to compile
twice (yes, ignoring VAX-endian!) and the size of the "bytecode"
doubles because I need to include 2x the code/data as I don't know
what target the bytecode will be run on. Not optimal, but workable in
my situation for now.
thanks,
scott