The semantics of the fptrunc instruction with an example of incorrect optimisation

I've recently been looking at how to implement in LLVM IR the rounding
of floating point values when casting using different rounding modes
and I've hit some problems.

It seems that when casting down floats to less precise types the
``fptrunc`` LLVM IR instruction is used. The LLVM language reference
suggests that it just truncates the value (which would be equivalent
to rounding towards zero) but this seems to be very misleading because
on the target I'm using (x86_64) that **is not** what happens.

Consider the following example in C

#include <stdio.h>
#include <fenv.h>
int main() {
    double x = 0.3;
    fesetround(FE_TONEAREST);
    float y = (float) x;
    printf("y (nearest):%a\n", y);
    fesetround(FE_UPWARD);
    y = (float) x;
    printf("y (upward):%a\n", y);
    fesetround(FE_DOWNWARD);
    y = (float) x;
    printf("y (downward):%a\n", y);
    return (int) y;
}

If I get the unoptimised LLVM IR for this by running ``clang -O0
float.c -emit-llvm -c -o float.clang.o0.bc`` I can see that the cast
of variable x is being handled using LLVM IR's ``fptrunc``

...
  store double 3.000000e-01, double* %x, align 8
  %call = call i32 @fesetround(i32 0) #3
  %0 = load double, double* %x, align 8
  %conv = fptrunc double %0 to float
....

If I look at the codegened assembly I see that the ``cvtsd2ss`` x86
instruction is used (how rounding is done is controlled by the MXCSR
register apparently). So this instruction might not "truncate"
depending on how MXCSR is set.

If I run the program

$ clang -O0 float.c -lm -o float.clang.o0
$ ./float.clang.o0
y (nearest):0x1.333334p-2
y (upward):0x1.333334p-2
y (downward):0x1.333332p-2

I can see that the last cast gives a different result because the
rounding mode has been changed as expected.

Now let's see what clang does when we ask it to optimize.

./float.clang.o3
y (nearest):0x1.333334p-2
y (upward):0x1.333334p-2
y (downward):0x1.333334p-2

The result of the last cast is wrong (note gcc at -O3 also seems to do
this) and looking at the optimized LLVM IR reveals why

define i32 @main() #0 {
entry:
  %call = tail call i32 @fesetround(i32 0) #2
  %call2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds
([16 x i8], [16 x i8]* @.str, i64 0, i64 0), double
0x3FD3333340000000) #2
  %call3 = tail call i32 @fesetround(i32 2048) #2
  %call6 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds
([15 x i8], [15 x i8]* @.str.1, i64 0, i64 0), double
0x3FD3333340000000) #2
  %call7 = tail call i32 @fesetround(i32 1024) #2
  %call10 = tail call i32 (i8*, ...) @printf(i8* getelementptr
inbounds ([17 x i8], [17 x i8]* @.str.2, i64 0, i64 0), double
0x3FD3333340000000) #2
  ret i32 0
}

the cast of a constant has been constant folded incorrectly (I guess
that clang is assuming a particular rounding mode which in this case
is sometimes the wrong rounding mode).

I'm not sure if there's a good way to fix this. First I thought it
would be better if the rounding mode was an operand to ``fptrunc``
(which would make constant folding correct) but then I realized that
for codegen to be always correct, every time a ``fptrunc`` is about to
be executed the rounding mode might to be reset which most of the time
would be a very wasteful thing to do.

In general its not (at least in C) possible always know what the
rounding mode is going to be statically at any point during the
program because it's part of the currently executing thread's state.

On the other hand LLVM IR isn't supposed to be tied to C so I feel
like there ought to be away to specify how certain floating point
operations do rounding. (I think these rounding issues apply to more
than just ``fptrunc``)

Any thoughts on this? At the very least the LLVM IR documentation
needs to be more specific about how rounding is done.

Thanks,
Dan.

I've recently been looking at how to implement in LLVM IR the rounding
of floating point values when casting using different rounding modes
and I've hit some problems.

It seems that when casting down floats to less precise types the
``fptrunc`` LLVM IR instruction is used. The LLVM language reference
suggests that it just truncates the value (which would be equivalent
to rounding towards zero) but this seems to be very misleading because
on the target I'm using (x86_64) that **is not** what happens.

Consider the following example in C

#include <stdio.h>
#include <fenv.h>
int main() {
    double x = 0.3;
    fesetround(FE_TONEAREST);
    float y = (float) x;
    printf("y (nearest):%a\n", y);
    fesetround(FE_UPWARD);
    y = (float) x;
    printf("y (upward):%a\n", y);
    fesetround(FE_DOWNWARD);
    y = (float) x;
    printf("y (downward):%a\n", y);
    return (int) y;
}

This sounds like 8100 – clang/llvm don't support C99 FP rounding mode pragmas (FENV_ACCESS etc) : complete
support for FP rounding and exceptions (via `#pragma STDC FENV_ACCESS
ON', which you need for fesetround to be "meaningful") isn't
implemented yet (and is probably a huge task, as you explain).

-Ahmed

Hi,

This sounds like 8100 – clang/llvm don't support C99 FP rounding mode pragmas (FENV_ACCESS etc) : complete
support for FP rounding and exceptions (via `#pragma STDC FENV_ACCESS
ON', which you need for fesetround to be "meaningful") isn't
implemented yet (and is probably a huge task, as you explain).

Thanks. I wasn't aware of ``STDC FENV_ACCESS``.

Supporting something like this no doubt is difficult. One way I could
imagine supporting rounding in a more general way would be to have all
floating point operations at the IR level take a rounding mode operand
(would let you do correctly rounded constant folding in all cases).
When doing codegen for something like x86 the most simplistic thing
you could do is reset the rounding mode for every floating point
operation but I could imagine handling this more efficiently by
computing call-free (functions known not to modify the rounding mode
could be ignored) single-entry-single-exit regions where the rounding
mode does not change and omitting rounding mode reset instructions
there. I'm not really sure if this is a good idea, if there aren't any
real world targets that make the rounding mode part of instruction
op-codes then it feels like this would be forcing a virtual machine
model in LLVM IR that although useful for static analysis poorly
reflects what real machines do.

However some low hanging fruit that could be addressed pretty quickly
would be to give better semantics to ``fptrunc`` based on how it is
codegened by targets currently. For x86 it seems to mean convert a
floating point number to a lower precision type using the current
rounding mode of the floating point environment. I don't really know
what the other targets do though so someone who has a broader overview
than me needs to rewrite the semantics.

Thanks,
Dan.