Stange behavior in fp arithmetics on x86 (bug possibly)

Hello everyone.

I’m not an expert neither in llvm nor in x86 nor in IEEE standard for floating point numbers, thus any of my following assumptions maybe wrong. If so, I will be grateful if you clarify me what’s goes wrong. But if my guesses are correct we possibly have a bug in fp arithmetics on x86.

I have the following ir:

@g = constant i64 1

define i32 @main() {

%gval = load i64* @g

%gvalfp = bitcast i64 %gval to double

%fmul = fmul double %gvalfp, -5.000000e-01

%fcmp = fcmp ueq double %fmul, -0.000000e+00

%ret = select i1 %fcmp, i32 1, i32 0

ret i32 %ret

}

And I expected that minimal positive denormalized double times -0.5 is equal to -0.0, so correct exit code is 1.

llvm-3.4.2 on x86 linux target produced the following assembly:

.file “fpfail.ll”

.section .rodata.cst8,“aM”,@progbits,8

.align 8

.LCPI0_0:

.quad -4620693217682128896 # double -0.5

.LCPI0_1:

.quad -9223372036854775808 # double -0

.text

.globl main

.align 16, 0x90

.type main,@function

main: # @main

.cfi_startproc

BB#0:

vmovsd g, %xmm0

vmulsd .LCPI0_0, %xmm0, %xmm0

vucomisd .LCPI0_1, %xmm0

sete %al

movzbl %al, %eax

ret

.Ltmp0:

.size main, .Ltmp0-main

.cfi_endproc

.type g,@object # @g

.section .rodata,“a”,@progbits

.globl g

.align 8

g:

.quad 1 # 0x1

.size g, 8

.section “.note.GNU-stack”,"",@progbits

./llc -march=x86 fpfail.ll; g++ fpfail.s; ./a.out; echo $?

returns 1 as expected.

But llvm-3.5 (on the same target) lowers the previous ir using floating point instructions in the following way.

.text

.file “fpfail.ll”

.section .rodata.cst4,“aM”,@progbits,4

.align 4

.LCPI0_0:

.long 3204448256 # float -0.5

.text

.globl main

.align 16, 0x90

.type main,@function

main: # @main

.cfi_startproc

BB#0:

fldl g

fmuls .LCPI0_0

fldz

fchs

fxch %st(1)

fucompp

fnstsw %ax

kill: AX AX EAX

kill: AH AH EAX

sahf

sete %al

movzbl %al, %eax

retl

.Ltmp0:

.size main, .Ltmp0-main

.cfi_endproc

.type g,@object # @g

.section .rodata,“a”,@progbits

.globl g

.align 8

g:

.quad 1 # 0x1

.size g, 8

.section “.note.GNU-stack”,"",@progbits

First, it doesn’t assemble with g++ (4.8):

fpfail.s:26: Error: invalid instruction suffix for `ret’

I downloaded Intel manual and haven’t found any mention of retl instruction, so I manually exchanged it with ret and reassemble:

g++ fpfail.s; ./a.out; echo $?

The exit code is 0. This is correct for Intel 80-bit floats but wrong for doubles. What am I do wrong or this is actually a bug or even worse – correct behavior?

Hi Dmitry,

fpfail.s:26: Error: invalid instruction suffix for `ret'

I downloaded Intel manual and haven’t found any mention of retl instruction,

"retl" is the AT&T syntax for the normal "ret" instruction in the
Intel manual, which makes it mostly undocumented.

The exit code is 0. This is correct for Intel 80-bit floats but wrong for
doubles. What am I do wrong or this is actually a bug or even worse –
correct behavior?

I think the default CPU used by llc was changed between 3.4 and 3.5.
Before, we defaulted to the host's CPU (from memory), but now we pick
a lowest common denominator "generic", which doesn't support SSE.

When the IR comes from Clang, I believe we define the
"FLT_EVAL_METHOD" macro to be 2 in this case (see C99 5.2.4.2.2),
which signals that operations are performed at "long double" precision
and the outcome you see is permitted.

So I *think* this is OK, unless I'm misunderstanding one of the specs involved.

Cheers.

Tim.

Are you targetting the same backend? i386 (32bit mode) uses FPU registers
for argument passing and return values, x86_64 / amd64 (64bit mode) uses
SSE registers for float/double values and FPU registers for long double.
The error on retl makes me think the second example is compiled for
i386, while the first example looks more like x86_64.

Joerg

Hi, Joerg

Both of the examples were compiled ./llc -march=x86 -O3 fpfail.ll (i386).
I've double checked it.

Kind regards, Dmitry Borisenkov

Behalf Of Joerg Sonnenberger

Are you sure about that? I don't recall ever seeing retl before. A while back a reference for AT&T was mentioned and, as I recall, this was the best anyone had <http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf>. It contains no mention of retl.

This seems to be the commit that added support for it <http://lists.cs.uiuc.edu/pipermail/llvm-branch-commits/2010-May/003229.html>.

I'm not sure I understand the distinction between retl/retq. x86 has 4 return instruction (cribbing from the Intel manual):

C3 RET Near return
CB RET Far return
C2 iw RET imm16 Near return + pop imm16 bytes
CA iw RET imm16 Far return + pop imm16 bytes

(And I think that's been true since the 8086.)

Distinguishing between near and far (e.g., ret vs. lret in AT&T or retn vs. retf with some other assemblers) makes sense, but what would a l or q suffix denote?

But more to the point, even if there's a good reason to accept retl/retq as input, is there any reason to emit it ever?

r198756 seems to be related too. That would explain why the difference appears in 3.5 relative to 3.4.

Since in x86 you can mix 16-bit and 32-bit code, therefore you must be able to distinguish between 16-bit and 32-bit return. And from there comes the w and l suffix for the return instruction.

code16:
ret = retw => C3
retl => 66 C3

code32:
ret = retl => C3
retw => 66 C3

And what comes to q suffix, it is either to be consistent or it just got cargo-culted.

Pasi

Makes total sense. I didn't think about using the operand size override. (I didn't even realize that was legal for ret.)

Thanks,