quad precision floating point emulation

Hi,

I have been developing a modula3 ir to llvm ir which is going pretty well. During testing
of one of the extended float types I am getting wrong results. Consider this code snippet

entry:
%x = alloca fp128, align 16
%y = alloca fp128, align 16
%z = alloca fp128, align 16
%l = alloca double, align 16
store fp128 0xL00000000000000004001800000000000, fp128* %y, align 16
store fp128 0xL00000000000000004000000000000000, fp128* %x, align 16
store fp128 0xL00000000000000004001000000000000, fp128* %z, align 16
%v.29 = load fp128* %z, align 16
%v.27 = load fp128* %x, align 16
%fadd = fadd fp128 %v.29, %v.27
store fp128 %fadd, fp128* %y, align 16
%v.28 = load fp128* %y, align 16
%ftrunc = fptrunc fp128 %v.28 to double
store double %ftrunc, double* %l, align 8

which is y = x + z x is 2.0 and z 4.0

on x86-64 compiling with llc the code generated is

subq $72, %rsp
.Ltmp22:
.cfi_def_cfa_offset 80
movabsq $4612108230892453888, %rax # imm = 0x4001800000000000
movq %rax, 40(%rsp)
movq $0, 32(%rsp)
movabsq $4611686018427387904, %rax # imm = 0x4000000000000000
movq %rax, 56(%rsp)
movq $0, 48(%rsp)
movabsq $4611967493404098560, %rcx # imm = 0x4001000000000000
movq %rcx, 24(%rsp)
movq $0, 16(%rsp)
movq 48(%rsp), %rdi
movq 56(%rsp), %rsi
xorl %edx, %edx
callq __addtf3
movq %rdx, 40(%rsp)
movq %rax, 32(%rsp)
movq %rax, %rdi
movq %rdx, %rsi
callq __trunctfdf2
movsd %xmm0, 8(%rsp)
movl $M_Const+232, %edi

Examining the stack after the return from __addtf3 y seems to be 0xfffffffffff (roughly)

This is linking in the gcc lib versions of __addtf3 and __trunctfdf2 which
do the software emulation of 128bit add and trunc to double.
I’m no expert in assembly but it looks like __addtf3 is not being passed enough
data. A similar piece of C code using __float128 types seems to pass its parameters in
a couple of xmm registers.

Should I be linking in the compiler_rt lib ? I would have thought these low
level float functions would be compatible with gcc libraries. Maybe there is
a flag for llc that I should be using.

I’m using llvm 3.5.

Regards Peter

Examining the stack after the return from __addtf3 y seems to be
0xfffffffffff (roughly)

We seem to be using entirely the wrong calling convention (though we
do pass everything somewhere: %rdi, %rsi, %rcx and %rdx contain the
right info). According to the AMD64 SysV ABI, __float128 types should
be passed and returned in %xmm regs.

I had thought Clang supported __float128. But apparently it doesn't,
which would explain why this is broken -- probably no-one has tested
it on amd64.

Should I be linking in the compiler_rt lib ?

That might "fix" it, as the two bugs cancel each other out. But your
code wouldn't be interoperable with anything else using __float128.

Unfortunately the real fix is in the x86 backend, and not exactly
trivial (I had to do a similar implementation in the AArch64 backend,
and types that are only really legal for parameter passing are a pain
to implement).

Cheers.

Tim.