Running applications with floating point trapping enabled has some issues.
Users may want FP traps enabled for various reasons. Some need this mechanism to make more robust applications by catching errors even in production environment. Others want to get better performance - for example, to run computations with fast-math options and, if infinities or NaNs appear, switch to the slow version. The users know that running applications with traps enabled costs nothing, and expect no performance drop.
Running application with FP traps enabled requires some support from compiler, GCC has option -ftrapping-math
for that. Clang also has such option but it is implemented as a synonym of -ffp-exception-behavior=strict
. These are different thing however. Strict exception tracking guarantees that FP status bits are changed according to the statement sequence in the source file. Trap on the other hand is a general mechanism provided by a processor. Setting status bits may initiate a trap but otherwise these are independent. One can easily imagine a core that does not have FP exception bits as IEEE-754 requires but still performs a trap on overflow or invalid operation.
Strict exception handling now is the only way in Clang to get semantics consistent with trapping. Unfortunately this mode is not suitable for practical use. It requires all floating-point operations to have side effect, which substantially limits optimizations. Also strict exception tracking is inconsistent with vectorization. As a result, performance is poor. For example, running SPEC 2017 fpspeed with option -O3
demonstrates 30% slowdown for 638.imagick_s if -ffp-exception-behavior=strict
is also specified.
Calculation with default FP modes is the most suitable solution as it provides the best performance. However Clang makes transformations that are not valid for trapping math, in particular:
- Constant folding is allowed to evaluate expressions that otherwise can perform a trap. For example, 0.0/0.0 can appear as a result of some optimization, like LTO. Now it is replaced by NaN and trapping does not happen.
- Checks made by
fcmp
andis_fpclass
are interchangeable now. However they have different behavior if traps are enabled and an argument is signaling NaN (which may be used to represent uninitialized FP value). Iffcmp
is used instead ofis_fpclass
, a superfluous trap is performed, in the reverse case the trap is missed.
The C Standard does not restrict using traps in default mode, #pragma STDC FENV_ACCESS ON
is not required for it, because setting traps does not change FP control modes. So trapping-math should be enabled by default. GCC behaves in this way.
I am interested in proper implementation of -ftrapping-math
. The option should be independent of other code generation options and can be combined with any of them: strict exceptions, fast-math, default. The reverse option “-fno-trapping-math” is added to the set of fast-math flags as GCC does. In IR this option is represented by the existing function attribute no-trapping-math
, its default value depends on the target. If it is set to false
, transformations preserve trapping on FP exceptions Invalid operation
, Overflow
and Division by zero
, and do not perform new traps. It is not intended to precisely keep all exceptions, they can be avoided by allowed transformation (like reassociation).
Any feedback is appreciated.
Thanks,
Serge