This contradicts an earlier answer:
Can you confirm that non-deterministically returning the preferred NaN of the chip is also acceptable?
This contradicts an earlier answer:
Can you confirm that non-deterministically returning the preferred NaN of the chip is also acceptable?
Is non-determinism acceptable? I don’t know. But I can tell you that RISC-V does not propagate NaN payloads in its FPU implementation. So, whatever our semantics are must be able to deal with that.
Excerpting from the RISC-V ISA spec (“8.3 NaN Generation and Propagation”):
Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000.
[…]
We considered propagating NaN payloads, as is recommended by the standard, but this decision would have increased hardware cost.
(“otherwise stated” is the sign-manipulation instructions).
Alright, thank you!
I’m implementing all this in Alive2 and will then run on LLVM’s test suite to check which optimizations fail. I may take a couple more days though.
To be clear, the semantics you are implementing is that NaN results have non-deterministic payload (including sign bit), except for operations like copysign that are specified to act on the bit representation?
Does this mean that if the input is sNaN you return an arbitrary qNaN?
Or do you zero out some signaling bit and keep the remaining payload as-is?
Just trying to understand what’s needed for LLVM IR semantics. Thanks!
Yes.
But also the rules that make arithmetic operations respect the NaN payload of the inputs, e.g.:
fadd NaN1, x
yields a non-deterministic choice between NaN1 and the canonical NaN (whatever that is). If NaN1 is an SNaN, I’m not sure about the semantics, see previous question.
Oh okay, so not completely non-deterministic. Interesting.
(I assume this is a different notion of ‘canonical’ than IEEE754.)
FWIW, here is what wasm guarantees:
When the result of a floating-point operator other than fneg, fabs, or copysign is a NaN, then its sign is non-deterministic and the payload is computed as follows:
- If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.
- Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is 1 and all others are unspecified.
Would it make sense for LLVM to adopt the wasm semantics? It seems consistent with what @jyknight wrote, but does not allow this behavior described by @jcranmer
If no input is a NaN and the output is a NaN (e.g., 0.0/0.0), a qNaN is returned. I’m not seeing any definition of what the payload can be in IEEE 754, but my understanding is that the intent is that an implementation that stores the address of instruction as the payload in this case is meant to be conforming.
The wasm semantics would require a ‘canonical’ NaN in this case.
It is also a lot weaker than this
If an input is a NaN and the output is a NaN, the result is a qNaN with the same payload as one of the inputs (which one is unspecified, and varies between different hardware implementations).
Basically wasm only guarantees this if the inputs all have canonical NaN.
This means when targeting wasm, LLVM cannot guarantee the above, it cannot even guarantee the weakened version adjusted to RISC-V
a non-deterministic choice between NaN1 [or any input NaN] and the canonical NaN (whatever that is)
It is unsurprising that wasm guarantees less than CPUs since it is meant to efficiently compile to many CPUs – but for LLVM, wasm is basically yet another CPU, so it seems hard for LLVM to guarantee more than wasm? (If it does, technically the compilation from LLVM IR to wasm needs to insert extra instructions to establish the guarantees LLVM makes but wasm does not.)
TBH, I got carried away with all the information in this thread
Alive2 used to have a fully non-deterministic NaN. Then I started looking into implementing @llvm.canonicalize
and then start wondering about how to implement sNaNs and etc.
So maybe a non-det semantics is sufficient for the optimizations that LLVM does today. But any feedback from the experts would be great, specially about planned optimizations.
I don’t have any opinion; I’ll implement whatever the consensus is.
We’ve discussed floating points and NaNs in particular on and off for years in Rust and I still feel like that.
I am inclined to suggest Rust should go with the wasm semantics, assuming they heard enough experts to form a reasonable compromise, but of course LLVM IR being an optimizing IR has slightly different trade-offs so I am also curious about the LLVM consensus. If LLVM guarantees less than what wasm guarantees (i.e., has more non-determinism), that would mean Rust cannot just pick wasm semantics, and if LLVM guarantees more than wasm (i.e., has less non-determinism), then compiling LLVM to wasm becomes challenging.
FWIW wasm used to have a lot more non-determinism than it does now – in the past, they just stated that the payload is always non-deterministic, period. Some time fairly recently they must have added this part about canonical NaNs. The sign bit is still non-det though, so even floating point operations with non-NaN inputs are still non-det.
I’ve implemented the semantics in Alive2 and run over LLVM’s test suite.
Some questions that arise from failures:
LLVM does fsub → fneg (Compiler Explorer).
Do we want to allow this? Always, or forbid it in strict fp mode?
What’s the semantic justification to allow converting a qNaN into a sNaN? In non-strict mode we go with the semantics that the output of arithmetic ops is a NaN with a non-det payload (qNaN or sNaN).
Does the sign of NaN mater?
Here’s an example optimization that flips the bit of NaN: Compiler Explorer
Is this ok always or just in non-strictfp mode? The justification is that the bit of NaNs is non-deterministic?
What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?
Is it non-det of nan1/nan2? Or is it a non-deterministic NaN? Also in strictfp mode?
@llvm.canonicalize
seems to use the denormal-fp-math
output option (and ignore the input option). Is that the expected behavior?
Does LLVM assume anything about sNaN/qNaN?
For example, LLVM does this optimization:
define float @nan_f64_trunc() {
%f = fptrunc double 0x7ff0000000000001 to float
ret float %f
}
=>
define float @nan_f64_trunc() {
ret float 0x7fc00000
}
How do you know that 0x7fc00000
is a qNaN? Or it doesn’t matter in non-strictfp mode as fptrunc returns a non-deterministic NaN (both qNaN and sNaN)?
Some of these questions may be repeated, but I just wanted to give concrete examples I found in LLVM’s test suite, to make sure we are all on the same page.
Thanks!
LLVM does fsub → fneg
I’d call this a bug. DAGCombiner and GlobalISel both try to get this right and check denorms etc. instcombine is the odd one out
Does the sign of NaN mater?
IEEE says no. We specifically didn’t try to handle these in a number of places.
What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?
Unfortunately with the current intrinsics maxnum=fmax. IEEE says either nan is acceptable (I’d have to double check what happens if one is a signaling nan wrt. payload bits). libm fmax ignores snans and treats the same as qnan
@llvm.canonicalize
seems to use thedenormal-fp-math
output option (and ignore the input option). Is that the expected behavior?
Not sure what you mean by this. It’s supposed to act like fmul %x, 1.0. If either the input or output is flushed, it’s the same thing.
Does LLVM assume anything about sNaN/qNaN?
Currently LLVM pretends snans don’t exist, except in rare occasions code tries to handle them.
Regarding llvm.canonicalize, let’s see this test case:
define float @canonicalize_neg_denorm_positive_zero_output_preserve_sign_input() "denormal-fp-math"="positive-zero,preserve-sign" {
; CHECK-NEXT: ret float 0.000000e+00
%ret = call float @llvm.canonicalize.f32(float bitcast (i32 -2139095041 to float))
ret float %ret
}
The input is a denormal. The attributes say that we should give +0 for output and preserve sign for input denormals.
The question is which one of these options applies? Since canonicalize doesn’t do any operation, if you apply denormal flushing to input, there’s nothing left to flush in the output.
If that’s the case, the test above is buggy as it should give -0.0 (preserve the sign).
LLVM does fsub → fneg
I’d call this a bug. DAGCombiner and GlobalISel both try to get this right and check denorms etc. instcombine is the odd one out
Thinking about this more, I’m not sure it’s a bug. It depends what our policy for observing non-canonical vs. canonical values is. We should possibly relax the DAG/GISel combiners here. I was thinking optimizations aren’t responsible for ensuring canonical results. If you want to observe a canonical result, you should use canonicalize. If we were to fully model canonicalization as part of the instruction semantics, we would need to introduce quite a few of them in places where we don’t.
LLVM does fsub → fneg
I’d call this a bug. DAGCombiner and GlobalISel both try to get this right and check denorms etc. instcombine is the odd one out
The semantics of the non-default denormal modes is another great set of questions that hasn’t been brought up yet!
But if you assume the default “ieee” denormal handling – as was the case in the example as-given – then denormals are passed through by fsub just fine, and the transform isn’t a problem for them. NaNs, though, potentially are.
Just to recap: fneg
will flip the sign of an input qNaN OR sNaN, and return it exactly as-is, other than the sign. Either of qNaN or sNan passed to fsub
will return “some kind of” qNaN (see previous discussion about qNaN payloads) – but sNaN will additionally trigger an invalid fp-exception in the process.
Assuming our policy is that preservation of NaN payloads and sign across optimizations is irrelevant, there’s no problem there w.r.t. qNaNs. If we additionally say we don’t promise to handle sNaNs – which appears to be the case – then this is a correct optimization.
Currently LLVM pretends snans don’t exist, except in rare occasions code tries to handle them.
I’d note that GCC has a separate -fsignaling-nans
flag, which is off by default, and not implied by other flags (e.g. not implied by -ftrapping-math
).
I wonder if it might make the most sense to consider both NaN payload preservation and sNaN support together as a new “nan payloads are important” option.
I also just ran into this document, which is interesting: https://www.agner.org/optimize/nan_propagation.pdf
I’d note that GCC has a separate
-fsignaling-nans
flag, which is off by default, and not implied by other flags (e.g. not implied by-ftrapping-math
).
Ok, so it seems that for gcc, fsub → fneg is fair game unless -fsignaling-nans
is on.
LLVM has the strictfp function attribute, which forbids assumptions on the rounding mode. I guess the effect is that FP operations become equal to the constrained FP ops with dynamic rounding? And thus eventually strictfp will go away?
(as a side-comment, we probably do not consider strictfp when hoisting function calls, which would be wrong. Which makes strictfp a not great attribute as it cannot be dropped safely. Would be best to have a “no-float-exceptions” attribute or whatever)
Do we want to support -fsignaling-nans
somehow? Strictfp doesn’t mention NaN propagation. Do we want to abuse it for that as well?
(Sorry for delay, I was on vacation)
Answering the post about the last set of questions, giving answers not already given:
Does this mean that if the input is sNaN you return an arbitrary qNaN?
Or do you zero out some signaling bit and keep the remaining payload as-is?
FWIW, the later IEEE-prescribed encoding for sNaN/qNaN is that if the leading bit of the mantissa is 1, it is qNaN; if it is 0, it is sNaN. This lets every sNaN payload have a corresponding qNaN payload, by setting that bit; one of the qNaN payloads does not have a sNaN counterpart, as the all-0 mantissa is infinity instead. IEEE-754 says, and I quote:
In the preferred encoding just described, a signaling NaN shall be quieted by setting [the leading bit of the mantissa] to 1, leaving the remaining bits of [the mantissa] unchanged.
So, in short, the later option you had–setting the quiet bit, and keeping the remaining bits the same–is the correct answer.
What’s the semantic justification to allow converting a qNaN into a sNaN? In non-strict mode we go with the semantics that the output of arithmetic ops is a NaN with a non-det payload (qNaN or sNaN).
The one justification I can see is that if you have, say, fadd x, y
, and you know that x
is a NaN, you can replace that with just x
, even if x
is sNaN.
Does the sign of NaN mater?
It really shouldn’t, but the propensity of printf
to display -nan
if the sign bit is set makes it visible in a way that NaN payloads are not.
What’s the return value of maxnum(nan1, nan2)? And fmax(nan1, nan2)?
Floating-point min/max functions are cursed. llvm.maxnum
is C’s fmax
, which is no longer present in IEEE 754-2019. From C2x’s definition of fmax
in Annex F, the result should be a quieted version of one of the NaNs if both inputs are NaN. The implementation is essentially supposed to be canonicalize((cond) ? x : y)
(although LLVM LangRef explicitly disavows the canonicalization step). The llvm.maximum
function (C’s fmaximum
) works like a regular arithmetic operation (it returns a qNaN if either input is NaN, and should propagate NaN payloads). The new IEEE 754-2019 operation maximumNumber
(not yet an LLVM intrinsic, C’s fmaximum_num
) is essentially a tightened definition of the original fmax
implementation, and returns a qNaN in the vein of other arithmetic operations if both of the inputs are NaN.
Do we want to support
-fsignaling-nans
somehow? Strictfp doesn’t mention NaN propagation. Do we want to abuse it for that as well?
I’m personally hesitant of yet more flags for FP behavior, since the existing flag set already has a combinatorial explosion of confusing semantics, and the most likely way the flag gets reflected in LLVM IR is via yet another function attribute with all of the attendant combinatorial semantic explosions that entails.
As I see it, the semantics of NaN with respect to FP operations boils down to return a non-deterministic result from one of the following sets (from weakest to strongest):
Choose any NaN, qNaN or sNaN. This is probably the closest model to what people reason about (FP is the set {finite values, +/- infinity, NaN}).
Choose any qNaN. This is what IEEE 754 guarantees (as NaN propagation is “should”, not “must”, something I earlier missed).
WASM rules: preferred qNaN if all NaN inputs are preferred qNaN (this includes the case where no input was a NaN, e.g., fsub(inf, inf)), otherwise any qNaN. This rule is sufficient to allow NaN-boxing without having to add conversion instructions for the results of FP ops.
Weak NaN-propagation: either one of the input NaNs (after quieting), or preferred NaN. I don’t think it strengthens enough from the previous rule to actually be a useful choice.
NaN-propagation: convert all input sNaNs to qNaN (by setting the quiet bit), and return any qNaN of any input. This rule allows you to use NaN propagation to use custom values in payload, and I can see frontends and libraries taking advantage of this if it could be reliably guaranteed.
I think there is value in specifying some way to get the final option if we can, and there’s also good reason to go with weaker results at the same time–especially by default, since some architectures can’t support the strength of guaranteed NaN propagation easily.
I’d like to add some background on wasm’s “canonical” NaN semantics (sorry, was following the thread until before that discussion started). It is worth noting that wasm’s ‘arithmetic NaNs’ including canonical ones are quiet, and in practice Wasm doesn’t support sNaNs.
Preserving Wasm’s canonical NaNs is going to be a source of overhead for operations that at the same time (a) produce a NaN if one input is NaN and (b) require some combination of bitwise operations to produce the final result. Thanks to the latter it would be hard to preserve canonical NaN bits and extra instructions would need to be added to the lowering (this is not a problem for operations that return NaN only if all inputs are NaN). For Wasm the affected operations are fmin and fmax, which are NaN-favoring variants (and not libc’s fmin/fmax), when implemented on x86.
Because of potential overhead, canonical NaNs create a bit of a paradox: on one hand, floating point operations cannot benefit from canonical NaNs (due to all quiet NaNs treated equally as far arithmetic is concerned), but on the other hand, will take all the overhead. Likewise, realistic consumers of canonical NaNs are limited to techniques which use FP values, but not the operations. Those are few and far between, it is mostly just NaN-boxing and for Wasm there are no direct users for the canonical NaNs returned by SIMD ops. I think it is a much wiser choice to provide operations that would improve the ability of consumers of canonical NaNs to produce those themselves, rather than have more common codes bear the cost.