Stronger floating-point NaN guarantees

LLVM recently adopted a guarantee that if all input-NaNs to a float operation are quiet, then any output NaN that is generated will also be quiet. I wonder if it would be possible to strengthen this further? Concretely I am looking for the following guarantee (inspired by wasm):

  • Define a “canonical NaN” to be a NaN with arbitrary sign but a payload that has the most significant bit be 1 and everything else be 0. (This assumes that a “1” in the most significant bit indicates a quiet NaN; it needs to be tweaked to support targets where that is not the case. Also, I am aware that LLVM already has canonicalize on floats which refers to a different notion of “canonical”; I am using wasm terminology here.)
  • Now we guarantee that if all input NaNs to an operation are canonical, then any output NaN produced is also canonical.

A guarantee of this sort is required for code that uses NaN boxing, such as the SpiderMonkey JavaScript engine. If LLVM were to violate this guarantee, there is a chance that compiling SpiderMonkey with clang could lead to wrong results. Lucky enough, so far it seems like LLVM actually provides this guarantee: its apfloat softfloat library will pick a canonical NaN when producing new NaNs, and will forward one of the input NaN payloads when propagating NaNs. (apfloat’s FMA(0, inf, NaN) might return a canonical NaN payload instead of propagating the input NaN payload, but while this doesn’t match hardware, it is still sufficient for this guarantee.) Most hardware currently in operation also matches this spec (at least x86, ARM, RISC-V), and the fact that wasm makes this requirement indicates that it is both useful and satisfied by a wide range of implementations.

So I wonder, is there any case where LLVM violates this guarantee? Would it be possible to have LLVM commit to providing this guarantee? In Rust we’d like to provide this guarantee to our users, but we are currently blocked on LLVM not documenting such a guarantee.

(wasm also guarantees that floating point operations never produce a signaling NaN, even if an input is a signaling NaN. That is not guaranteed by LLVM. I am not aware of problems caused by this behavior and I am not asking to change this.)

3 Likes

We should just document this, this is the APFloat behavior as it is. The high mantissa bit is the quiet bit is the standard defined way since the 2008 standard anyway.

it needs to be tweaked to support targets where that is not the case.

Where this really means “old mips” and has never worked correctly. I think we should just document this as won’t work and 2008 signaling nans are assumed. Realistically nobody is ever going to put in the effort to support anything else

floating point operations never produce a signaling NaN, even if an input is a signaling NaN. That is not guaranteed by LLVM

It’s not guaranteed for non-strictfp operations, except llvm.canonicalize. It is supposed to be preserved for strictfp. You’re still guaranteed that FP operations don’t introduce new signaling nan though.

It’s not just about apfloat though, it’s also about other transformations. However, all the ones I can think of (like x+y to y+x, or x+0.0 to x) are still fine.

FWIW it seems like on old MIPS the same guarantee could be provided – the only change required is to negate the canonical bit pattern (i.e., it has a 0 in the most significant bit and 1 everywhere else).

Ah, so with strictfp LLVM will not fold x * 1.0 to x any more?

Yes, that is the crucial point and that is what is documented.

But actually implementing that is a nontrivial amount of work. We would need to invent some kind of datalayout property indicate signaling nan layout, and thread that through every place that considers folds on nans.

Are there many of those places? How are those places getting the NaN value they’re using now?

Some pass through constant nans they already have. Others go through APFloat::getNaN/getQNaN, which would have to gain an argument for snan layout. An alternative approach might be to define an entirely separate set of float types with the unusual snan layout. IR producers would then be responsible for swapping out the types used

Define a “canonical NaN” to be a NaN with arbitrary sign but a payload that has the most significant bit be 1 and everything else be 0. (This assumes that a “1” in the most significant bit indicates a quiet NaN; it needs to be tweaked to support targets where that is not the case.)

As a minor preference in terminology, I’d prefer to see this called a “preferred NaN”, so that it isn’t confused with the IEEE 754 notion of “canonical” (as something like x86_fp80 introduces values which are not canonical and yet still NaNs). Also, people might assume that something like llvm.canonicalize causes the input NaN to become the “canonical NaN”, which it does not (since it refers to IEEE 754’s notion of canonical).

Now we guarantee that if all input NaNs to an operation are canonical, then any output NaN produced is also canonical.

I think one thing that is useful to be explicit about here is that if none of the inputs are NaN (e.g., you’re computing 0.0 / 0.0), then the output NaN is still the preferred NaN.

So I wonder, is there any case where LLVM violates this guarantee? Would it be possible to have LLVM commit to providing this guarantee? In Rust we’d like to provide this guarantee to our users, but we are currently blocked on LLVM not documenting such a guarantee.

However, I will note that there is one platform that doesn’t exactly meet the rules you specify: SPARC. With SPARC, the resulting NaN of an operation with no input NaNs is the one with all bits set to 1 save the sign bit, which is not the same as the preferred NaN representation you propose (nor the same as in most other hardware implementations).

As usual things become complicated when you look at the details, I guess. :wink:
It seems like making MIPS conform with this proposed stronger spec is no harder than making it confirm with the existing documented spec, so things are at least not getting worse for anyone who wants to fix that target. We can just amend the issue as appropriate. (Or maybe LLVM has a process for partially supported targets?)

Yeah, I was just using wasm terminology but “preferred” sounds preferable indeed. :slight_smile:

Can we just say that on SPARC, the preferred NaN has the payload full of ones and arbitrary sign bit? As in, the only difference to x86/ARM/RISC-V is the preferred payload, but the general pattern still holds true?

I guess actually implementing this would have the same difficulty as MIPS though, so it boils down to SPARC getting a bug report that is similar to the MIPS one. That’s unfortunate but seems to be the best that can be done currently.

TBH, this feels like repeat of discussion in Semantics of NaN

This is not the whole story, and also not where the difficulty implementing this lies. Native SM fixes up output of non-compiliant FP ops, such as x86 min and max, which is pretty much the norm for NaN-boxing software.

The problem with wasm-style canonical NaN rule is that it makes additional restrictions, such as comparing signed zeros, expensive to implement. As an exercise think how you would IEEE-compliant minimum and maximum (both NaN-propagating and not) on x86 if you had to also had to return canonical NaNs for canonical NaN inputs.

It is true that wasm has used NaN-boxing as a motivation to add this rule, however that lead to a rather odd situation, when concerns of NaN-boxing trump concerns of floating point comparisons. Normally a more narrow use case would be the one performing special handling, which native SpiderMonkey actually does on x86.

As a less relevant aside, canonical NaNs are pre-W3C addition to wasm standard and are absent from JS.

There is at least one other platform where the creation of a NaN (as opposed to the propagation of a NaN from Operand to Result) creates a NaN with a non-zero payload. And in this case the payload is a 3-bit code denoting why the NaN was created, and the lower-order 48-bit instruction pointer to the instruction which created this NaN.

There is at least one other platform where the creation of a NaN (as opposed to the propagation of a NaN from Operand to Result) creates a NaN with a non-zero payload. And in this case the payload is a 3-bit code denoting why the NaN was created, and the lower-order 48-bit instruction pointer to the instruction which created this NaN.

Which platform is this? I know that encoding the IP to the faulting instruction to the payload was a possibility, but I was unaware that anyone had actually built such a thing.

I don’t see how canonical NaNs make it any harder…
e.g. none of these treats any particular NaN kind specially, they correctly preserve canonical NaNs, and afaict they correctly implement the IEEE 754-2019 ops (with +0.0 treated as > -0.0) except that sNaNs are not quieted (asm in AT&T syntax):

maximum_f32:
  vandps %xmm0, %xmm1, %xmm2
  vmaxss %xmm0, %xmm1, %xmm3
  vcmpunordss %xmm0, %xmm0, %xmm4
  vblendvps %xmm4, %xmm0, %xmm3, %xmm3
  vcmpeqss %xmm1, %xmm0, %xmm0
  vblendvps %xmm0, %xmm2, %xmm3, %xmm0
  retq

minimum_f32:
  vorps %xmm0, %xmm1, %xmm2
  vminss %xmm0, %xmm1, %xmm3
  vcmpunordss %xmm0, %xmm0, %xmm4
  vblendvps %xmm4, %xmm0, %xmm3, %xmm3
  vcmpeqss %xmm1, %xmm0, %xmm0
  vblendvps %xmm0, %xmm2, %xmm3, %xmm0
  retq

maximum_number_f32:
  vandps %xmm0, %xmm1, %xmm2
  vmaxss %xmm0, %xmm1, %xmm3
  vcmpunordss %xmm0, %xmm0, %xmm4
  vblendvps %xmm4, %xmm1, %xmm3, %xmm3
  vcmpeqss %xmm1, %xmm0, %xmm0
  vblendvps %xmm0, %xmm2, %xmm3, %xmm0
  retq

minimum_number_f32:
  vorps %xmm0, %xmm1, %xmm2
  vminss %xmm0, %xmm1, %xmm3
  vcmpunordss %xmm0, %xmm0, %xmm4
  vblendvps %xmm4, %xmm1, %xmm3, %xmm3
  vcmpeqss %xmm1, %xmm0, %xmm0
  vblendvps %xmm0, %xmm2, %xmm3, %xmm0
  retq

We can always resort to making this guarantee only for certain targets.

But the point is that for platforms that have preferred NaNs (or at least for platforms that have the “usual” preferred NaN), I don’t think LLVM should be introducing non-preferred NaNs. There seems to be no reason to do so, and languages regularly get requests from users to preserve the NaN behavior of the underlying hardware as much as possible. I think this request is reasonable when it is cheap to realize, hence my question whether LLVM actually already de-facto implements this behavior (which would make it basically free to realize).

I’m sorry for bringing this up again. Back in that discussion I didn’t have the time to push harder for stronger guarantees. Now I finally got around to making an effort at documenting our FP guarantees in Rust, so the question came up again. The discussion back then seems to have stopped after “min and max can’t provide this guarantee”; I don’t think this should lead us to directly conclude “nothing should provide this guarantee”.

Then it would seem reasonable to me to simply exempt these 2 operations from the guarantee, but still provide it everywhere else. You mentioned that SM fixes up output of x86 min and max; under the current LLVM spec, SM would have to fix up output of every single FP operation.

This seems like it’ll be required.

We should not make any guarantee for LLVM IR semantics which prevents e.g. lowering an FP-add to a single hardware instruction on a target ISA, even if that target doesn’t implement a singular preferred NaN. No spec other than wasm requires preferred-NaN semantics, so it seems out of place for LLVM IR to impose such a requirement on hardware, where providing it would have overhead. (It would be nice if IEEE754 did impose such a requirement in the future, but…)

From an implementation-simplicity standpoint, we can’t currently implement any preferred NaN other than all-zeros, so I’d suggest restricting this further to only consider hardware which is consistent with LLVM’s current default NaN.

I don’t see how canonical NaNs make it any harder…

The original mozilla implementation, which got correct values but not the preferred NaN bits, was 3 instructions for OR, 5 for AND, and didn’t require AVX instructions. I think that definitely counts as “easier” (though you may argue that the difference doesn’t really matter much).

See the Mozilla commit that first fixed this, though note that the current version of MacroAssembler-x86-shared-SIMD.cpp has also added an AVX implementation.

Agree. Another point about semantics of NaNs is that they are sometimes proposed as a form of error codes, where the payload would encode the error. This is not how wasm NaNs work either, as it requires preserving only the two canonical values.

Why is this necessary? It clearly doesn’t do it at the moment.

Please let me know if I am misreading the examples, but wouldn’t you need to blend with both input arguments if you are going to preserve NaNs bits from the inputs? To strictly preserve NaN inputs you would have to find them in first argument, blend, find them in the second argument, blend. It is sometimes easier to detect NaN outputs and just replace them with canonical values.

There are tricks for implementing this via bitwise ops that rely on the fact that all NaNs are equal, like number || NaN is a NaN, and which you loose if canonical NaNs are required.

no, you only need a blend where vmaxss selects the wrong input (or when you need to fix the sign bit), since vmaxss copies verbatim whichever input it chose to the output.

I was trying to implement the current LanRef’s NaN semantics in Alive2.
My interpretation is that we have true float ops and bitwise ops. For the true ops, they return non-deterministically one of their NaN inputs or the result. Bitwise ops just return the result.

So we have:
fsub 0.0, %x

If %x is a NaN, fsub will return %x or a QNaN (since ops can’t produce SNaNs).

But fneg %x returns -%x always. If it’s an SNaN, then it returns the negated SNaN.

So, fneg %s can return a bitwise pattern that fsub can’t. Hence refinement (of fsub 0, x => fneg x) doesn’t hold.

Or does it? Do we want to allow any two SNaNs to compare equal? Or even two NaNs compare equal?
I’m a bit lost, sorry.

0.0 - x isn’t the same as -x anyway because when x is 0.0, 0.0 - x is 0.0 whereas -x is -0.0

for fsub -0.0, xfneg x, that works with the WASM NaN rules, since when the input is a canonical NaN, both operations produce a canonical NaN, and when the input is not a canonical NaN (all signaling NaNs are not canonical NaNs), then the output can be any NaN, which fneg satisfies