- Do all arithmetic FP operations return a canonical NaN (i.e., a fixed bit pattern)? Or can they return different bit patterns on each execution?
Neither is a correct answer. The basic rule of thumb for operations (note this includes math library functions) in IEEE 754 is the flow is as follows:
- If an input is a sNaN, signal the invalid exception and quiet the NaN.
- If an input is a NaN and the output is a NaN, the result is a qNaN with the same payload as one of the inputs (which one is unspecified, and varies between different hardware implementations).
- If no input is a NaN and the output is a NaN (e.g., 0.0/0.0), a qNaN is returned. I’m not seeing any definition of what the payload can be in IEEE 754, but my understanding is that the intent is that an implementation that stores the address of instruction as the payload in this case is meant to be conforming.
In other words, absent a few special-case operations, it’s guaranteed that the output of an operation is a qNaN, and a “custom” NaN payload is guaranteed to propagate through operations in a matter akin to pointer provenance. There is a concept of “canonical NaN”, but I believe this is parlance for cases like x87’s pseudo-NaN values, and for the basic bfloat
, half
, float
, double
, and fp128
types, every NaN is canonical.
- How instructions always return silent NaNs? Or is there a global flag to make them return signaling NaNs?
With a few exceptions (principally the operations that explicitly only manipulate the sign bit, or that allow users to construct custom NaNs like C’s setpayload
), the result of every arithmetic operation is guaranteed to not be an sNaN.
- Do the regular FP ops in LLVM signal with signaling NaNs? Do you need to use some specific intrinsic? Or does it depend on a global flag?
C explicitly allows implementations the freedom to treat all sNaNs as if they were qNaNs, and C2x adds an FE_SNANS_ALWAYS_SIGNAL
flag to detect if they take this freedom or not. I believe it is reasonable to assume that anything not using strictfp
and constrained intrinsics is operating in an sNaN-is-qNaN mode (in other words, the C2x flag should only be set if -ffp-model=strict
or similar is on the command line).
- Do FP operations canonicalize NaNs? For example, can
fsub %x, 0
be replaced with %x
? If NaN canonicalization happens, then the answer is no! Likewise for fmul %x, 1.0
The non-constrained operations are assumed to be in default rounding mode and ignoring exceptions, and NaN payloads are guaranteed to propagate, so replacing fsub %x, 0
with %x
should be legal if the floating-point type is not x87_fp80
or ppc_fp128
(which have non-canonical NaNs). Actually, if you’re in a denormals-are-zero function, then it also changes the value, since a denormal is effectively noncanonical in that scenario.
- Are
fabs
and fneg
special when handling denormals and NaNs? Do they only flip the sign bit and that’s it?
Yes. These operations are documented as flipping only the sign bit. They don’t even signal an exception if the input is sNaN, and they will return an sNaN output if the input is sNaN. They even preserve noncanonical encodings.
- If 5) is true, then
fsub 0, %x
if not equivalent to fneg %x
?
That is why an fneg
instruction was effectively added.
- Does load canonicalizes NaNs? This is a long-standing question, so I would like to document it. If the answer is no, can we support x87 correctly? Does that matter? What about bitcast?
- Do phi & select canonicalize NaNs? In case they have fast-math flags, e.g. we only adjust the sign bit (for nsz)?
Canonicalization should ideally happen only at the user’s explicit request, although I believe almost every floating-point arithmetic operation implicitly canonicalizes its inputs. (Note that canonicalization’s effects are to turn sNaN into qNaN, denormals into zero [in DAZ mode], and… something for the x87_fp80
types and ppc_fp128
that I doubt too many people actually care about).
I’m not sure if DAZ mode means that fcmp x, y
is equivalent to fcmp (canonicalize x), (canonicalize y)
(it appears to be true on x86 from the manual), but if that is the case, then you can handle noncanonical loads on x87 by only loading it as a float if you know all of the uses implicitly canonicalize, and using integer operations for those uses that don’t canonicalize (which include fneg
and fabs
). The only case I don’t know how to make noncanonical easily is ret float %x
… maybe you could do it if you used FXSAVE
/FXRSTOR
? (NB: this is not a performance-oriented lowering.)
- Can the compiler assume any NaN bit pattern. E.g., can we replace
fdiv 0.0, 0.0
with 2139095041
(or any NaN pattern)?
Now you’re getting into what LLVM’s NaN semantics should be as opposed to IEEE 754.
Semantically, any operation that generates a NaN (i.e., none of its inputs were a NaN) should be viewed as producing an unspecified qNaN. Pragmatically, it should be the preferred qNaN (which, as @arsenm noted, is different for older MIPS processors). If the inputs are NaNs, then it is safe to replace fdiv %x, %y
with canonicalize %x
or canonicalize %y
(if both are NaN, it’s unspecified which one it returns, so the optimizer could pick either one). I don’t think it is wise to eliminate user-added canonicalize
operations, but I suspect in practice, we do a bad job of preserving the implicit canonicalize steps of existing FP operations.