Semantics of NaN

This contradicts an earlier answer:

Can you confirm that non-deterministically returning the preferred NaN of the chip is also acceptable?

Is non-determinism acceptable? I don’t know. But I can tell you that RISC-V does not propagate NaN payloads in its FPU implementation. So, whatever our semantics are must be able to deal with that.

Excerpting from the RISC-V ISA spec (“8.3 NaN Generation and Propagation”):

Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000.
[…]
We considered propagating NaN payloads, as is recommended by the standard, but this decision would have increased hardware cost.

(“otherwise stated” is the sign-manipulation instructions).

Alright, thank you!
I’m implementing all this in Alive2 and will then run on LLVM’s test suite to check which optimizations fail. I may take a couple more days though.

To be clear, the semantics you are implementing is that NaN results have non-deterministic payload (including sign bit), except for operations like copysign that are specified to act on the bit representation?

Does this mean that if the input is sNaN you return an arbitrary qNaN?
Or do you zero out some signaling bit and keep the remaining payload as-is?

Just trying to understand what’s needed for LLVM IR semantics. Thanks!

Yes.
But also the rules that make arithmetic operations respect the NaN payload of the inputs, e.g.:
fadd NaN1, x
yields a non-deterministic choice between NaN1 and the canonical NaN (whatever that is). If NaN1 is an SNaN, I’m not sure about the semantics, see previous question.

Oh okay, so not completely non-deterministic. Interesting.
(I assume this is a different notion of ‘canonical’ than IEEE754.)

FWIW, here is what wasm guarantees:

When the result of a floating-point operator other than fneg, fabs, or copysign is a NaN, then its sign is non-deterministic and the payload is computed as follows:

  • If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.
  • Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is 1 and all others are unspecified.

Would it make sense for LLVM to adopt the wasm semantics? It seems consistent with what @jyknight wrote, but does not allow this behavior described by @jcranmer

If no input is a NaN and the output is a NaN (e.g., 0.0/0.0), a qNaN is returned. I’m not seeing any definition of what the payload can be in IEEE 754, but my understanding is that the intent is that an implementation that stores the address of instruction as the payload in this case is meant to be conforming.

The wasm semantics would require a ‘canonical’ NaN in this case.

It is also a lot weaker than this

If an input is a NaN and the output is a NaN, the result is a qNaN with the same payload as one of the inputs (which one is unspecified, and varies between different hardware implementations).

Basically wasm only guarantees this if the inputs all have canonical NaN.

This means when targeting wasm, LLVM cannot guarantee the above, it cannot even guarantee the weakened version adjusted to RISC-V

a non-deterministic choice between NaN1 [or any input NaN] and the canonical NaN (whatever that is)

It is unsurprising that wasm guarantees less than CPUs since it is meant to efficiently compile to many CPUs – but for LLVM, wasm is basically yet another CPU, so it seems hard for LLVM to guarantee more than wasm? (If it does, technically the compilation from LLVM IR to wasm needs to insert extra instructions to establish the guarantees LLVM makes but wasm does not.)

1 Like

TBH, I got carried away with all the information in this thread :slight_smile:
Alive2 used to have a fully non-deterministic NaN. Then I started looking into implementing @llvm.canonicalize and then start wondering about how to implement sNaNs and etc.
So maybe a non-det semantics is sufficient for the optimizations that LLVM does today. But any feedback from the experts would be great, specially about planned optimizations.

I don’t have any opinion; I’ll implement whatever the consensus is.

We’ve discussed floating points and NaNs in particular on and off for years in Rust and I still feel like that. :wink:

I am inclined to suggest Rust should go with the wasm semantics, assuming they heard enough experts to form a reasonable compromise, but of course LLVM IR being an optimizing IR has slightly different trade-offs so I am also curious about the LLVM consensus. If LLVM guarantees less than what wasm guarantees (i.e., has more non-determinism), that would mean Rust cannot just pick wasm semantics, and if LLVM guarantees more than wasm (i.e., has less non-determinism), then compiling LLVM to wasm becomes challenging.

FWIW wasm used to have a lot more non-determinism than it does now – in the past, they just stated that the payload is always non-deterministic, period. Some time fairly recently they must have added this part about canonical NaNs. The sign bit is still non-det though, so even floating point operations with non-NaN inputs are still non-det.

I’ve implemented the semantics in Alive2 and run over LLVM’s test suite.
Some questions that arise from failures:

  1. jump bumping this one:
  1. LLVM does fsub → fneg (Compiler Explorer).
    Do we want to allow this? Always, or forbid it in strict fp mode?
    What’s the semantic justification to allow converting a qNaN into a sNaN? In non-strict mode we go with the semantics that the output of arithmetic ops is a NaN with a non-det payload (qNaN or sNaN).

  2. Does the sign of NaN mater?
    Here’s an example optimization that flips the bit of NaN: Compiler Explorer
    Is this ok always or just in non-strictfp mode? The justification is that the bit of NaNs is non-deterministic?

  3. What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?
    Is it non-det of nan1/nan2? Or is it a non-deterministic NaN? Also in strictfp mode?

  4. @llvm.canonicalize seems to use the denormal-fp-math output option (and ignore the input option). Is that the expected behavior?

  5. Does LLVM assume anything about sNaN/qNaN?
    For example, LLVM does this optimization:

define float @nan_f64_trunc() {
  %f = fptrunc double 0x7ff0000000000001 to float
  ret float %f
}
=>
define float @nan_f64_trunc() {
  ret float 0x7fc00000
}

How do you know that 0x7fc00000 is a qNaN? Or it doesn’t matter in non-strictfp mode as fptrunc returns a non-deterministic NaN (both qNaN and sNaN)?

Some of these questions may be repeated, but I just wanted to give concrete examples I found in LLVM’s test suite, to make sure we are all on the same page.
Thanks!

LLVM does fsub → fneg

I’d call this a bug. DAGCombiner and GlobalISel both try to get this right and check denorms etc. instcombine is the odd one out

Does the sign of NaN mater?

IEEE says no. We specifically didn’t try to handle these in a number of places.

What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?

Unfortunately with the current intrinsics maxnum=fmax. IEEE says either nan is acceptable (I’d have to double check what happens if one is a signaling nan wrt. payload bits). libm fmax ignores snans and treats the same as qnan

  • @llvm.canonicalize seems to use the denormal-fp-math output option (and ignore the input option). Is that the expected behavior?

Not sure what you mean by this. It’s supposed to act like fmul %x, 1.0. If either the input or output is flushed, it’s the same thing.

Does LLVM assume anything about sNaN/qNaN?

Currently LLVM pretends snans don’t exist, except in rare occasions code tries to handle them.

Regarding llvm.canonicalize, let’s see this test case:

define float @canonicalize_neg_denorm_positive_zero_output_preserve_sign_input() "denormal-fp-math"="positive-zero,preserve-sign" {
; CHECK-NEXT:    ret float 0.000000e+00
  %ret = call float @llvm.canonicalize.f32(float bitcast (i32 -2139095041 to float))
  ret float %ret
}

The input is a denormal. The attributes say that we should give +0 for output and preserve sign for input denormals.
The question is which one of these options applies? Since canonicalize doesn’t do any operation, if you apply denormal flushing to input, there’s nothing left to flush in the output.
If that’s the case, the test above is buggy as it should give -0.0 (preserve the sign).

LLVM does fsub → fneg

I’d call this a bug. DAGCombiner and GlobalISel both try to get this right and check denorms etc. instcombine is the odd one out

Thinking about this more, I’m not sure it’s a bug. It depends what our policy for observing non-canonical vs. canonical values is. We should possibly relax the DAG/GISel combiners here. I was thinking optimizations aren’t responsible for ensuring canonical results. If you want to observe a canonical result, you should use canonicalize. If we were to fully model canonicalization as part of the instruction semantics, we would need to introduce quite a few of them in places where we don’t.

The semantics of the non-default denormal modes is another great set of questions that hasn’t been brought up yet!

But if you assume the default “ieee” denormal handling – as was the case in the example as-given – then denormals are passed through by fsub just fine, and the transform isn’t a problem for them. NaNs, though, potentially are.

Just to recap: fneg will flip the sign of an input qNaN OR sNaN, and return it exactly as-is, other than the sign. Either of qNaN or sNan passed to fsub will return “some kind of” qNaN (see previous discussion about qNaN payloads) – but sNaN will additionally trigger an invalid fp-exception in the process.

Assuming our policy is that preservation of NaN payloads and sign across optimizations is irrelevant, there’s no problem there w.r.t. qNaNs. If we additionally say we don’t promise to handle sNaNs – which appears to be the case – then this is a correct optimization.

Currently LLVM pretends snans don’t exist, except in rare occasions code tries to handle them.

I’d note that GCC has a separate -fsignaling-nans flag, which is off by default, and not implied by other flags (e.g. not implied by -ftrapping-math).

I wonder if it might make the most sense to consider both NaN payload preservation and sNaN support together as a new “nan payloads are important” option.

I also just ran into this document, which is interesting: https://www.agner.org/optimize/nan_propagation.pdf

1 Like

Ok, so it seems that for gcc, fsub → fneg is fair game unless -fsignaling-nans is on.

LLVM has the strictfp function attribute, which forbids assumptions on the rounding mode. I guess the effect is that FP operations become equal to the constrained FP ops with dynamic rounding? And thus eventually strictfp will go away?
(as a side-comment, we probably do not consider strictfp when hoisting function calls, which would be wrong. Which makes strictfp a not great attribute as it cannot be dropped safely. Would be best to have a “no-float-exceptions” attribute or whatever)

Do we want to support -fsignaling-nans somehow? Strictfp doesn’t mention NaN propagation. Do we want to abuse it for that as well?

(Sorry for delay, I was on vacation)

Answering the post about the last set of questions, giving answers not already given:

Does this mean that if the input is sNaN you return an arbitrary qNaN?
Or do you zero out some signaling bit and keep the remaining payload as-is?

FWIW, the later IEEE-prescribed encoding for sNaN/qNaN is that if the leading bit of the mantissa is 1, it is qNaN; if it is 0, it is sNaN. This lets every sNaN payload have a corresponding qNaN payload, by setting that bit; one of the qNaN payloads does not have a sNaN counterpart, as the all-0 mantissa is infinity instead. IEEE-754 says, and I quote:

In the preferred encoding just described, a signaling NaN shall be quieted by setting [the leading bit of the mantissa] to 1, leaving the remaining bits of [the mantissa] unchanged.

So, in short, the later option you had–setting the quiet bit, and keeping the remaining bits the same–is the correct answer.

What’s the semantic justification to allow converting a qNaN into a sNaN? In non-strict mode we go with the semantics that the output of arithmetic ops is a NaN with a non-det payload (qNaN or sNaN).

The one justification I can see is that if you have, say, fadd x, y, and you know that x is a NaN, you can replace that with just x, even if x is sNaN.

Does the sign of NaN mater?

It really shouldn’t, but the propensity of printf to display -nan if the sign bit is set makes it visible in a way that NaN payloads are not.

What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?

Floating-point min/max functions are cursed. llvm.maxnum is C’s fmax, which is no longer present in IEEE 754-2019. From C2x’s definition of fmax in Annex F, the result should be a quieted version of one of the NaNs if both inputs are NaN. The implementation is essentially supposed to be canonicalize((cond) ? x : y) (although LLVM LangRef explicitly disavows the canonicalization step). The llvm.maximum function (C’s fmaximum) works like a regular arithmetic operation (it returns a qNaN if either input is NaN, and should propagate NaN payloads). The new IEEE 754-2019 operation maximumNumber (not yet an LLVM intrinsic, C’s fmaximum_num) is essentially a tightened definition of the original fmax implementation, and returns a qNaN in the vein of other arithmetic operations if both of the inputs are NaN.

Do we want to support -fsignaling-nans somehow? Strictfp doesn’t mention NaN propagation. Do we want to abuse it for that as well?

I’m personally hesitant of yet more flags for FP behavior, since the existing flag set already has a combinatorial explosion of confusing semantics, and the most likely way the flag gets reflected in LLVM IR is via yet another function attribute with all of the attendant combinatorial semantic explosions that entails.

As I see it, the semantics of NaN with respect to FP operations boils down to return a non-deterministic result from one of the following sets (from weakest to strongest):

  • Choose any NaN, qNaN or sNaN. This is probably the closest model to what people reason about (FP is the set {finite values, +/- infinity, NaN}).

  • Choose any qNaN. This is what IEEE 754 guarantees (as NaN propagation is “should”, not “must”, something I earlier missed).

  • WASM rules: preferred qNaN if all NaN inputs are preferred qNaN (this includes the case where no input was a NaN, e.g., fsub(inf, inf)), otherwise any qNaN. This rule is sufficient to allow NaN-boxing without having to add conversion instructions for the results of FP ops.

  • Weak NaN-propagation: either one of the input NaNs (after quieting), or preferred NaN. I don’t think it strengthens enough from the previous rule to actually be a useful choice.

  • NaN-propagation: convert all input sNaNs to qNaN (by setting the quiet bit), and return any qNaN of any input. This rule allows you to use NaN propagation to use custom values in payload, and I can see frontends and libraries taking advantage of this if it could be reliably guaranteed.

I think there is value in specifying some way to get the final option if we can, and there’s also good reason to go with weaker results at the same time–especially by default, since some architectures can’t support the strength of guaranteed NaN propagation easily.

1 Like

I’d like to add some background on wasm’s “canonical” NaN semantics (sorry, was following the thread until before that discussion started). It is worth noting that wasm’s ‘arithmetic NaNs’ including canonical ones are quiet, and in practice Wasm doesn’t support sNaNs.

Preserving Wasm’s canonical NaNs is going to be a source of overhead for operations that at the same time (a) produce a NaN if one input is NaN and (b) require some combination of bitwise operations to produce the final result. Thanks to the latter it would be hard to preserve canonical NaN bits and extra instructions would need to be added to the lowering (this is not a problem for operations that return NaN only if all inputs are NaN). For Wasm the affected operations are fmin and fmax, which are NaN-favoring variants (and not libc’s fmin/fmax), when implemented on x86.

Because of potential overhead, canonical NaNs create a bit of a paradox: on one hand, floating point operations cannot benefit from canonical NaNs (due to all quiet NaNs treated equally as far arithmetic is concerned), but on the other hand, will take all the overhead. Likewise, realistic consumers of canonical NaNs are limited to techniques which use FP values, but not the operations. Those are few and far between, it is mostly just NaN-boxing and for Wasm there are no direct users for the canonical NaNs returned by SIMD ops. I think it is a much wiser choice to provide operations that would improve the ability of consumers of canonical NaNs to produce those themselves, rather than have more common codes bear the cost.

nlopes [1]nlopes
November 29

 jyknight:

 Iâd note that GCC has a separate -fsignaling-nans flag, which is off
 by default, and not implied by other flags (e.g. not implied by
 -ftrapping-math).

Ok, so it seems that for gcc, fsub â fneg is fair game unless
-fsignaling-nans is on.

LLVM has the strictfp function attribute, which forbids assumptions on
the rounding mode. I guess the effect is that FP operations become
equal to the constrained FP ops with dynamic rounding? And thus
eventually strictfp will go away?

That attribute is only valid on function definitions and function calls
including all uses of the constrained intrinsics. It isn’t valid on arbitrary
instructions. And floating point llvm instructions are not allowed in
functions that use the constrained intrinsics. So, no, putting the strictfp
attribute on an fp instruction doesn’t change anything and is actually an
error. No change to the IR verifier checks for this, though, unfortunately.

The strictfp attribute does nothing to the rounding mode. The constrained
intrinsics where rounding mode matters are able to state they run with a
known rounding mode or dynamic rounding. But that doesn’t come from the
strictfp attribute itself.

(as a side-comment, we probably do not consider strictfp when hoisting
function calls, which would be wrong. Which makes strictfp a not great
attribute as it cannot be dropped safely. Would be best to have a
âno-float-exceptionsâ attribute or whatever)

The constrained intrinsics appear as function calls that transformation
passes by default know nothing about. Do we hoist function calls that have
unknown effects? I doubt it.

Do we want to support -fsignaling-nans somehow? Strictfp doesnât
mention NaN propagation. Do we want to abuse it for that as well?

We’ve been using IEEE 754 as a reference when working on strictfp / the
constrained intrinsics. When 754 is silent the C standard is sometimes
used. So whatever is specified by the standards documents is the goal with
strictfp.

Plus, the constrained intrinsics that can raise an exception all have an
exception handing argument. With -ffp-exception-behavior=strict we are not
allowed to drop any exceptions. Is that what gcc’s -fsignaling-nans does?

For Wasm the affected operations are fmin and fmax, which are NaN-favoring variants (and not libc’s fmin/fmax), when implemented on x86.

This is a sidenote, but that intrigued me, so I looked it up and found the mozilla commit – x86 has a vectors min/max instruction MINPS, but it treats -0 as equal to 0, instead of less than as required by wasm, and return the second argument. And it also returns the second arg if either arg is a NaN.

Thus, the wasm VM attempted a clever trick, effectively bitwise_or(fmin(a, b), fmin(b, a)), which, due to the way the bits fall, has the effect of ensuring that the output is forced to NaN if either side is a NaN, and forced to -0 if either arg is -0 and the other is 0. But, the NaN bits will not be canonical, even if the input was. So, sadly, that clever trick doesn’t adhere to the spec.

1 Like
  1. Do all arithmetic FP operations return a canonical NaN (i.e., a fixed bit pattern)? Or can they return different bit patterns on each execution?

Machine specific; most machines simply barf up a NaN, a few deposit the code address of the instruction creating the NaN in the NaN payload; if the exceptions are enabled, SW signal handler can take a look at the payload and adjust it as it sees fit.

  1. How instructions always return silent NaNs? Or is there a global flag to make them return signaling NaNs?

IEEE 754 requires an instruction to deliver a quiet NaN when one of the arguments is a NaN (of any kind). The general rule is that the performance of an instruction quiets the operand NaNs.

  1. Do the regular FP ops in LLVM signal with signaling NaNs? Do you need to use some specific intrinsic? Or does it depend on a global flag?

Machine dependent:: Some machines have some sort of global flag, others have individual enables on a per IEEE flag-bit table. Most everyone wants to ignore “inexact” and have various tolerances for “Overflow”, “Underflow”, and “DivZero”. Everyone basically agrees that a bad Operand is to raise a trap.

  1. Do FP operations canonicalize NaNs? For example, can fsub %x, 0 be replaced with %x? If NaN canonicalization happens, then the answer is no! Likewise for fmul %x, 1.0

This is dependent on a lot of stuff:: the machine, the compilation environment, sometimes the Operating System. In general the conversion of arithmetic to less costly forms is accepted. There are a few IEEE 754-2019 requirements not to miss, here.

  1. Are fabs and fneg special when handling denormals and NaNs? Do they only flip the sign bit and that’s it?

IEEE 754 allows for these to be performed with sign-bit flipping; but does not mandate.
IEEE 754 allows for Signaling NaNs to raise signals !! on fabs() and fneg().
So, you implementation will determine which recipe is followed.

  1. If 5) is true, then fsub 0, %x if not equivalent to fneg %x?

A numerical analyst could chew your ear for 13 days without rest for making a statement like that.
But most (after the lecture) would allow fsub #0,R7 → fneg R7 transformation.

  1. Does load canonicalizes NaNs? This is a long-standing question, so I would like to document it. If the answer is no, can we support x87 correctly? Does that matter? What about bitcast?

The job of a ST is to deposit a bit pattern in a memory container
The job of a LD is to extract a bit pattern from a memory container
A ST of the appropriate size, and later a LD of the appropriate size to the very same container (which has not been accessed elsewhere in between) must recreate the bit pattern of the original value.
This is a purely quality of implementation argument not an IEEE 754 requirement.

  1. Do phi & select canonicalize NaNs? In case they have fast-math flags, e.g. we only adjust the sign bit (for nsz)?

A phi selects from one of multiple paths. If the phi selects from a path containing a NaN, its result is a NaN --it may NOT be the same NaN bit-pattern, but it is a NaN (This is all that is required from IEEE 754) A high quality of implementation will deliver an exact NaN from the chosen path.

  1. Can the compiler assume any NaN bit pattern. E.g., can we replace fdiv 0.0, 0.0 with 2139095041 (or any NaN pattern)?

IEEE 754-2019 has an extensive list of the various insundry values a conforming implementation must perform. Chapter 9 IIRC.

One of the most important principles is that NaNs go to the else-clause as originally written.

 if( x < y )
 { then-clause }
 else
 { else-clause }

So, if you invert the clauses as::

 if( x >= y )
 { else-clause }
 else
 { then-clause }

A NaN in either x or y must transfer control to the else-clause. If you code changes which clause is invoked upon encountering a NaN your implementation is non-conforming.

Many machines have little respect for NaNs, and do not carry payloads reliably. Machine in this category deal with NaNs by expending as little effort as practicable. About the only thing you can do with NaNs on such a machine is is to simply let NaNs propagate and sort it out later.

A few machines respect NaNs and provide guidelines as to NaN propagation::
a) A single NaN operand is delivered as the result when a NaN is the required result.
b) when there are more than 1 NaNs operands to an instruction::

  1. 3-operand instructions favor the augend NaN over the multiplicand and multiplier NaNs
  2. 2-operand instructions favor one of the operands over the other. Source 1 over Source 2

(1) it is generally more important that FMAC record where the augend overflowed than to follow any particular product.
(2) Just define something and run with it.