I'm trying to return an aggregate structure (a complex number, really) containing two 32-bit floats or two 64-bit doubles, by value. LLVM 2.6 compiled as 64-bit on OSX 10.6/x86-64.
The IR I generate for a simple test case looks like this (float case):
%0 = type { float, float }
define %0 @test600() {
entry:
ret %0 { float 4.200000e+01, float 3.500000e+01 }
}
Running that through llc, the x86-64 assembly looks like this (abbreviated):
movss LCPI72_0(%rip), %xmm0
movss LCPI72_1(%rip), %xmm1
ret
Now, if I write a C function that does the same thing:
struct complex_float {
float real;
float imag;
};
static struct complex_float foo()
{
struct complex_float x = {42.0, 35.0};
return x;
}
The assembly code looks like this (compiled with GCC and disassembled by GDB):
0x00000001000010ac <foo+0>: push %rbp
0x00000001000010ad <foo+1>: mov %rsp,%rbp
0x00000001000010b0 <foo+4>: mov $0x42280000,%eax
0x00000001000010b5 <foo+9>: mov %eax,-0x10(%rbp)
0x00000001000010b8 <foo+12>: mov $0x420c0000,%eax
0x00000001000010bd <foo+17>: mov %eax,-0xc(%rbp)
0x00000001000010c0 <foo+20>: mov -0x10(%rbp),%rax
0x00000001000010c4 <foo+24>: movd %rax,%xmm0
0x00000001000010c9 <foo+29>: leaveq
0x00000001000010ca <foo+30>: retq
Here the two values are returned 'packed' in xmm0, while LLVM returns them separately in xmm0 and xmm1. This is causing problems when calling the function generated by LLVM from C (or my real use case, via libFFI). What happens is that I get the 'real' value in the struct back with the right value, but the 'imag' value is wrong -- always comes back as zero.
What makes this more confusing is that if you switch the IR and C examples to use double instead of float, everything works -- the C disassembly appears to use xmm1 for the second value just like LLVM. Furthermore, I can do the same experiment with 32- and 64-bit int as my struct element types. Behavior mirrors that of float and double -- 32-bit does not work right, while 64-bit does.
Am I doing something wrong here, or is this an issue in LLVM? Is there something I can do to work around this issue and get the proper behavior?
Thanks,
Andrew