Semantics of floating-point instructions are unclear

I’ve recently filed an issue about LLVM’s floating-point semantics, and I was hoping to get a bit more visibility and/or discussion here. Essentially, the core floating-point arithmetic operations (addition, subtraction, multiplication and division) don’t claim to produce results according to any particular model. APFloat implements IEEE 754 arithmetic, and optimizations seem to be meant to adhere to it (for the most part). However, there is no textual guarantee in the LangRef that fadd and friends actually produce such results. This is a problem for users of LLVM who want to know their de facto floating-point semantics.

As best as I can tell, if the IR instructions are meant to correspond to IEEE 754 arithmetic, then certainly codegen doesn’t seem to know about it. x87 codegen in particular doesn’t even try to match IEEE 754 results. If IR instructions are just being blindly lowered to their closest counterpart on non-IEEE 754 targets, then the many optimizations relying on IEEE 754 semantics will lead to broken code. (e.g. Wrong optimization: instability of x87 floating-point results leads to nonsense · Issue #44218 · llvm/llvm-project · GitHub) The key question is, then, do fadd, fsub, fmul and fdiv follow IEEE 754 semantics, or some unspecified target-specific semantics? If it’s the former, then LLVM’s floating-point support is seriously broken on multiple targets. If it’s the latter, then target-agnostic floating-point IR transformations become unsound in many cases.

Maybe it’s imposed by the type of the operands, here are the supported float type in LLVM: LLVM Language Reference Manual — LLVM 17.0.0git documentation

AFAIK, APFloat is an internal data structure for carrying floating constants. It’s not used to represent floating point values in IR.

Well, yes, but it is used to do things like constant-folding, which ought to have the same semantics.

1 Like

fadd etc. should follow IEEE754, with the following caveats:

  • As you’ve noticed, x87 has been broken since forever; the correct sequences aren’t obvious, it’s not clear how many people would take the performance penalty, and software built for targets without SSE2 is becoming increasingly rare. Other targets, including x86 targets with SSE2 enabled, shouldn’t be affected by this (except maybe m68k); the whole “long double register” thing was a failed experiment of the 80’s.
  • denormals are weird on certain targets (see LLVM Language Reference Manual — LLVM 17.0.0git documentation)
  • fast-math flags can change results, as you might expect
  • Messing with the floating-point environment isn’t allowed unless you’re using constrained intrinsics (LLVM Language Reference Manual — LLVM 17.0.0git documentation)

I’ve recently filed an Basic floating-point operations are underspecified · Issue #60942 · llvm/llvm-project · GitHub about LLVM’s floating-point semantics, and I was hoping to get a bit more visibility and/or discussion here. Essentially, the core floating-point arithmetic operations (addition, subtraction, multiplication and division) don’t claim to produce results according to any particular model. APFloat implements IEEE 754 arithmetic, and optimizations seem to be meant to adhere to it (for the most part). However, there is no textual guarantee in the LangRef that fadd and friends actually produce such results. This is a problem for users of LLVM who want to know their de facto floating-point semantics.

The present de facto semantics of LLVM floating-point is that they are implemented as IEEE 754 types that may sometimes be evaluated with higher internal precision, and sometimes you get other fun effects like denormal. This is somewhat akin to the FLT_EVAL_METHOD == 2 setting in C (everything is evaluated with long double precision instead), except there are some other requirements there that are not handled correctly.

As best as I can tell, if the IR instructions are meant to correspond to IEEE 754 arithmetic, then certainly codegen doesn’t seem to know about it. x87 codegen in particular doesn’t even try to match IEEE 754 results. If IR instructions are just being blindly lowered to their closest counterpart on non-IEEE 754 targets, then the many optimizations relying on IEEE 754 semantics will lead to broken code. (e.g. Wrong optimization: instability of x87 floating-point results leads to nonsense · Issue #44218 · llvm/llvm-project · GitHub) The key question is, then, do fadd, fsub, fmul and fdiv follow IEEE 754 semantics, or some unspecified target-specific semantics? If it’s the former, then LLVM’s floating-point support is seriously broken on multiple targets. If it’s the latter, then target-agnostic floating-point IR transformations become unsound in many cases.

If SSE2 support is enabled, x86 code lowering I believe prefers to lower everything that’s not x87_fp80 to using the vector units instead of x87, so the only remaining issue in that case is that the existing C ABIs store float/double results on the x87 stack (which means you can’t return an sNaN).

There is definitely effort being made to make the floating-point semantics more correct (@arsenm has a few patches that make denormal-fp-math somewhat more sane, although it’s still the frontend’s responsibility to figure out how to set that attribute correctly).

As a practical matter, the semantics of floating-point should probably be interpreted as “these are IEEE 754 types, and are implemented precisely according to IEEE 754 [except for cases like fast math, or non-constrained fp being UB if not in the default fp environment],” with the semantics on x87 and other targets where implementing IEEE 754 precisely is difficult-to-impossible being effectively WONTFIX known bugs. Unfortunately, we don’t have a lot of target-specific documentation pages where we can add notes to the effect that x87 fp semantics are buggy, at least not for the more conventional CPU targets.

1 Like

What guarantees are there about the behavior of floating-point code?

Almost none.

At most you can use ADD, SUB, MUL, DIV, and SQRT on operands that are not infinities, not negative zero, and not denormal, using round-nearest-even, and expect a reasonable result at 0.5 ULP. Otherwise, you are in ID behavior, even for architectures who do implement sound IEEE 754 compliant feature functionality.

You can count on none of the comparisons (x < y) to have the properties specified in IEEE 754, and none of the recommended functions, including things as easy as COPYSIGN(). You cannot count on accessing the rounding modes, you cannot assume changing rounding modes work, or that changing rounding mode is inexpensive. Many times an implementation will have implemented something that is frustratingly close to what IEEE 754 specifies, but fails at some insundry boundary condition. example::

 if( x < y )
      then-clause
 else
      else-clause

IEEE specifies that if x or y is NaN then control is transferred to the else-clause. There are compilers which actively disobey this subtly, rearranging then and else clauses willy nilly.

You SHOULD NOT try to optimize anything across a parenthesis boundary (FORTRAN mandates this, and uses parens to force order of evaluation. Here, FP is simply not like integer, and the good programmer has gone out of his/her way to structure the code such that is it evaluated in the order specified in ASCII source code. I have seen cases where adding 0.5 twice has more accuracy than adding 1.0 once–luckily IEEE 754 eliminated this horseplay.)

You cannot trust the accuracy of any IEEE 754 recommended {transcendental} function with an argument that “needs argument reduction” or touches anywhere near to a range or domain condition of the algorithm {tan(π/4)}. So, in general you cant trust anything–and here–the compiler BETTER NOT be trying to optimize ANYTHING. {Unless, as described below, you are a budding numerical analysist.}

If IEEE 754 is the intended model, then x87 codegen is completely broken and probably other targets are as well.

There are many who take the position that x87 is not compliant with IEEE 754, and it is certainly not compliant with IEEE 754-2008 or 2019. Then there who insist that it does comply.

Even if x87 were completely compliant with IEEE 754-2019, you would not want programmers who actually understand numerical analysis to have their subroutine compiled for x87 instead of SSE. The numerical properties are significantly different–leading to astonishing problems when assuming one and getting the other; both directions.

If not IEEE 754, then what is the intended model?

Even within IEEE 754 compliant architectures, you have implementations that Flush Denorms to Zero, implement only Round to Nearest, and fail to provide access to 754 specified functions {mainly in the comparisons and transcendental areas of the standard}. Heck, many architectures do not even bother to give the compiler instructions which transfer control properly to the else-clause on detection of a NaN operand (above example).

If you want to optimize floating point arithmetic, you have to stick to ADD, SUB, MUL, DIV, and SQRT; using only RNE or else understand the numeric properties of the target implementation such that you can write numeric proofs of transcendental functions using the primitives of the target instruction set.

About the only other thing that is safe is reading/writing ASCII FP numbers and to/from binary bit patterns–but here you still have to stay away from NaNs, Infinities, Denorms, and negative zero unless you are the aforementioned budding numerical analysist.

A long time ago there was a large body of numerical analysists that were employed as programmers and as verifiers who’s job it is to check and certify that one algorithm or another produces the “right” result for all arguments in the defined range. With IBM, CDC, and CRAY-like arithmetics (read poor FP numerics) These people lived comfortable in the knowledge their jobs were secure.

Then IEEE 754-1984 came along and programmers began to think that anyone can write good numeric codes, and a large part of what numerical analysts did has been lost. Later (mid-90s) compilers started getting good enough to start significant FP optimizations, proceeding until this day. Compiler writers should, at least, not try to make the situation worse by making whatever is left of the NA practice by optimizing things that should not be optimized or by rearranging instructions which can change results of the algorithm in subtle and annoying ways.

You might get the hint that this is a tricky area::

I have spend the last 40 years as a computer architect, spending the 8 years prior writing compilers.

I work with a LLVM compiler guy (Brian) and sometimes he comes up with a potential FP optimization, or I see that some optimization has been performed; and I have to go back through IEE 754-2019 to find the actual specification, reason about the standard, reason about the arithmetic properties, the optimization and the implementation before making very subtle alterations to my floating point unit specification on the target implementation before I sign off on the optimization. {So, the optimization does not interfere with the robustness of the final product.}

But I want to end up with a solidly-robust -2019 compliant floating point of my ISA (surprise coefficient as close to 0 as possible). And you only get 1 chance to get the architectural definition correct so that it survives the lifetime of the architecture. Otherwise you get into the bug-for-bug compatible morass that x87 (and others) finds itself.

Given that the architecture {like x86-64} is already fixed in stone, the opportunities are small indeed, and the dangers loom large, and the subtly is very hard to discover and analyze.

Right, but then we have the issues where optimizations use the IEEE 754 model, which is then not respected by codegen, causing outright contradictions in control flow. This is much worse than numerical accuracy problems. “Don’t use floating-point in conditionals” is not really a tenable stance. I’m not very familiar with LLVM’s documentation. If pages don’t exist for this sort of thing, could they be created? Alternatively, would it be possible to issue warnings that IEEE 754 support is incomplete and buggy on targets where it is incomplete and buggy?

I did not suggest “don’t use FP conditionals” these are “from” the source code, you have to do something useful with them !! Just don’t venture out so far as to ignore the source code requirements and end up doing something abuseful with them…

I suggested not trying to optimize them (that is don’t alter the conditionality or swap then and else clauses; unless you know enough about the target to keep the NaNs in the right clause.)

I was referring to the situation we have in Wrong optimization: instability of x87 floating-point results leads to nonsense · Issue #44218 · llvm/llvm-project · GitHub where optimization causes a conditional to be evaluated as both true and false because codegen spills and rounds the x87 register in one instance and doesn’t in the other.

I ran into this code fragment (C) today illustrating the peril of optimizing FP arithmetic; and this should NOT be optimized even though people not fluent in numerical analysis would probably want to optimize it::

  erfx = ( 0.5 - erfx ) + 0.5;

Here is an expression that a person in algebra would recompose as::

  erfx = 1.0 - erfx;

Yet I can tell you that the original preserves 0.5-1.0 ULP of accuracy when erf is between [¼…½) that the later does not.

FWIW, there recently was a somewhat related discussion specifically about the NaN aspects of FP semantics:

This also lead to a LangRef change.

Thanks, I’m tracking the todo item here: Update NaN semantics · Issue #888 · AliveToolkit/alive2 · GitHub
The new semantics is a bit painful to implement, hence it’s a bit in the backburner for now.