So I just read that with clang 14 -ffp-contract=on is now the default. This is a horrible idea. The results of FMA instructions are not IEEE compliant as they do not exhibit correct rounding behavior, and should only be turned on explicitly by someone who understands exactly what they’re doing and understands how these instructions affect results.
IEEE754 specifies a fusedMultiplyAdd operation, but does not have anything to say about which C syntax represesnts which operations. So, this behavior is neither compliant nor non-compliant w.r.t. that spec. The behavior controlled by -ffp-contract=on
is compliant with the C standard, however, which explicitly discusses and blesses this.
Note that (as required by the C standard) contraction is only permitted within a single expression – that is, double fma(double a, double b) { return a * b + 5; }
can emit an fusedMultiplyAdd operation, but double ma(double a, double b) { double m = a * b; return m + 5; }
cannot.
Additionally, the behavior is controllable in the source code by the standard #pragma STDC FP_CONTRACT {ON|OFF}
.
(Contrast with the -ffp-contract=fast
flag, which enables non-C-standard-compliant behavior that ignores the #pragma
and will create contractions after other optimizations like inlining, and across expressions.)
My issue isn’t with the FMA optimization or the behavior of -ffp-contract=on. My issue is with the fact that now -ffp-contract=on is now on by default with -ffp-model=precise. This results in expressions like return a * b + 5; returning values that are different from what’s expected.
If you want to have FMA optimizations, then you should have to choose them explicitly, either by using -ffp-contract=on or -ffp-contract=fast because you really need to know what you’re doing when you turn these things on.
Here is an unexpected behavior with fma contractions when done in a wild manner:
x1y2 - x2y1.
We found that this is evaluated as fma(x1, y2, -x2*y1) in precise model on apple M1, that is one multiplication and subtraction is done using high precision the second multiplication is rounded. As a result this expression is not zero when x1 == x2 and y1 == y2.
It seems the contraction should not be applied to anything more complicated than a * b + c.
This came up on the GCC side last month, with @fweimer-rh bringing up some of the issues it causes to have -ffp-contract=fast
by the default for GCC: Concerns regarding the -ffp-contract=fast default - Florian Weimer.
I’d concur that GCC should not be using the non-standard “fp-contract=fast” mode by default, and use “on” instead, like clang already does.