Stronger floating-point NaN guarantees

arsenm · July 27, 2023, 6:33pm

Right, this is one of the defined differences from IEEE. Signaling nans do not really exist for non-strictfp.

It’s UB if you don’t enable fenv access. The problem I’m referring to is a busted gcc+clang behavior where it side loads an abnormal mode behind your back if you use -ffast-math

That is what you get. I’m saying that -fdenormal-fp-math=preserve-sign doesn’t guarantee that fmul x, 1.0 where x is a denormal gives you a zero

klausler · July 27, 2023, 6:37pm

Thus my reading of IEEE 754-2019 allows for the transformation of fmul %x,1.0 into just %x for all of the standard calculations {and for certain intrinsics too called “additional mathematical operations” in the standard.}

If x is a signaling NaN, would this transformation avoid raising the required exception?

(This would be okay for Fortran, whose standards allow rewriting of expressions to mathematically equivalent ones so long as parentheses are preserved, but maybe not for C or C++.)

jcranmer · July 27, 2023, 9:18pm

In the examples @arsenm linked, I think there is some confusion as to what is desired and legal, because there are several overlapping issues going on here.

The conversion of fmul %x, 1.0 into just %x should be legal (ignoring the issue of DAZ/FTZ, since that’s not a NaN issue), and this is by design–we’ve explicitly clarified the semantics specifically to permit this conversion.

But the conversion under discussion in @arsenm’s examples is the reverse conversion of %x to fmul %x, 1.0, and looking back at our previous sNaN discussions, I don’t think we have ever actually properly discussed this case. (The work that @nlopes has been putting into Alive2 is certainly helpful in highlighting these underdefinitions!).

The interpretation I have of these rules is that the “sNaN is treated like qNaN” is meant to only apply to FP computational operations, and that even if you have an sNaN pattern crop up somewhere, it won’t spontaneously be quieted by an operation (e.g., a select, a load, or a phi). This interpretation isn’t the only interpretation, however.

jyknight · July 27, 2023, 11:14pm

The interpretation I have of these rules is that the “sNaN is treated like qNaN” is meant to only apply to FP computational operations, and that even if you have an sNaN pattern crop up somewhere, it won’t spontaneously be quieted by an operation (e.g., a select, a load, or a phi).

I didn’t think this was in question. We DID previously discuss thus, and I believe the semantics we landed on are what you indicate here. Yes, it is illegal to replace %x with fmul %x, 1.0. The representation bits of an FP value do not mutate arbitrarily. Certain FP operations have somewhat-wobbly semantics w.r.t. their result’s NaN payloads and sNaN vs qNaN bits, but that’s a property of just those operations.

(We also discussed that i386 is broken in that it DOES modify values randomly. It’s very broken with SSE2 disabled, and broken for function return-values even so.)

Mitch_Alsup · July 27, 2023, 11:17pm

When x is a sNaN and the calculation it was to become consumed in is “optimized away” the result still retains the property that “when consumed” it will still be treated as a sNaN (raise exception). This is the property that give the compiler ability to optimize away “calculation identities” in the face of NaN variables.

RalfJung · July 28, 2023, 5:58am

Oh I see, I didn’t get that context. I am mostly concerned with the default behavior here, maybe with some fast-math attributes at individual operations mixed in (which of course reduces the guarantees for these operations, but shouldn’t have any effect on other operations), but without any global flags. Those will need a separate chapter in the LangRef describing their effects.

That’s an inadequate definition of “consumed”. “consumed” must be defined in terms of the expressions the user writes, they shouldn’t have to know the intricate details of the compiler middle-end and back-end. Under your definition it becomes essentially impossible for the user to know when a value is actually “consumed”.

If the programmer writes float f = x * 1.0;, then x clearly has been “consumed”. Whatever the compiler does with this needs to be stated purely in terms of the syntax of the original program. In this case, it turns out that x * 1.0 can produce a signaling NaN (if x is a signaling NaN). That’s not the controversial part though, that part is already in the LangRef.

The controversial (or at least new) part is, for instance, guaranteeing that on x86/ARM/RISC-V 0.0/0.0 will produce a NaN with a payload that’s all-0 except for the quiet bit. Or more generally: if all input NaNs are preferred, then all output NaNs are preferred. That’s the new guarantee I hope to see added to the LangRef. From the discussion so far, it seems like we have to exclude min/max from that guarantee (at least on x86? does this affect both “old” min and “new” minimum?), as well as SPARC and MIPS targets (mostly for implementation reasons – those gave a preferred NaN payload that would make this guarantee work, but it is different from that of other targets). But LLVM’s own optimizations are in line with this guarantee.

There was also a proposal for an even stricter guarantee to define the set of possible output NaNs as “any preferred NaN, any input NaN, any quieted input NaN”. That would exclude wasm from the set of targets where we can provide this guarantee.

Is there sufficient consensus to move ahead with a guarantee like that? And which guarantee should it be? I think it would be good to at least provide the wasm guarantee on a wasm target. Is it worth providing the stronger guarantee on x86/ARM/RISC-V?

nlopes · July 28, 2023, 2:53pm

Found a case with Alive2 where it happens in practice:

; Transforms/InstCombine/shuffle_select.ll

define <4 x float> @fadd(<4 x float> %v) {
  %b = fadd <4 x float> %v, { 41.000000, 42.000000, 43.000000, 44.000000 }
  %s = shufflevector <4 x float> %b, <4 x float> %v, 0, 1, 6, 7
  ret <4 x float> %s
}
=>
define <4 x float> @fadd(<4 x float> %v) {
  %s = fadd <4 x float> %v, { 41.000000, 42.000000, -0.000000, -0.000000 }
  ret <4 x float> %s
}

Essentially this transforms x into x + -0.0.

Under the semantics we’ve been discussing, this is not correct because it changes the NaN payload.

We need an agreement whether we want to support this, or else remove this optimization from InstCombine (or do it only with nnan).

arsenm · July 28, 2023, 3:13pm

This example doesn’t need to be considered in terms of the nan payload, it’s a canonicalization introducing combine. This could also incur denormal flushing and snan quieting which did not occur originally

RalfJung · July 28, 2023, 3:16pm

I think this is fairly clearly wrong because…

The only way to save this would be to say that shufflevector is allowed to kill NaN payloads.

Mitch_Alsup · July 28, 2023, 6:00pm

If you take this point of view, you cannot optimize away any floating point calculation unless you can prove none of the operands can ever be sNaN.

Clearly this was NOT the intent of the committee.

RalfJung · July 28, 2023, 9:02pm

Yes. You can also optimize it away if you can prove that the output is only consumed by other floating point operations. Turning x * 1.0 + y into x + y is still totally fine. Having assert_quiet_nan(x * 1.0) break is not fine.

That is quite clearly the intent of the IEEE 754 committee – or do they say anything about optimizations anywhere? They are concerned with providing a set of operations for reliable floating-point computations. It is C compilers with their very strong focus on performance that decided to violate the IEEE spec with some of their optimizations. (I’d love to see data showing that removing x * 1.0 in cases where the output might be bit-inspected even matters.) This has some consequences that I can only describe as concerning, for instance pow(1.0, x * 1.0) can return vastly different results depending on which optimizations kick in (when x is a signalling NaN input, it will sometimes produce 1, sometimes NaN). Basically the entire signaling NaN system becomes completely useless if the compiler can just optimize away operations. They surely would not have specified it with the intent to be so useless.

Anyway I’ve learned to live with these semantics, and I doubt it causes huge hassle in practice (mostly because very few applications actually care about this signaling NaN business – though maybe more applications would care if it actually worked reliably). But I will strongly object to any claim that this is complying with IEEE. It is a blatant violation of the letter and the intent of the standard.

Can we get back to discussing to what extent LLVM is willing to expose the underlying hardware behavior for NaN payloads, and promise that its optimizations will not interfere with that? In particular I’d like to understand whether people would prefer to document a wasm-style guarantee that applies to more targets, or a stronger guarantee of “the output NaN is chosen from this set” (or have the latter for targets where it is possible and the former for wasm) – what do you think?

arsenm · July 28, 2023, 9:28pm

But it is completely useless. The only source of signaling nans is supposed to be uninitialized values. They were a bad idea obsoleted by address sanitizer.

If you care about always quieting, you can always just force inserting a canonicalize around all loads of floating point values, instead of forcing quieting constraints on every downstream computation.

arsenm · July 28, 2023, 9:54pm

I don’t think I understand to what end you want this guarantee. I think something stronger than "can arbitrarily flip payload bits) would be an improvement. We don’t really define target implementation details in the LangRef. Do you just want to see a consistent result at some specific point? Do we need some kind of nan-fixup intrinsic you can call at these points?

jcranmer · July 28, 2023, 10:36pm

That is quite clearly the intent of the IEEE 754 committee – or do they say anything about optimizations anywhere? They are concerned with providing a set of operations for reliable floating-point computations.

They do talk about optimizations a little bit (section 10.4 explicitly discusses “value-changing optimizations”). It specifically says:

A language standard should require that by default, when no optimizations are enabled and no alternate exception handling is enabled, language implementations preserve the literal meaning of the source code.

It goes on to indicate that converting 1 * x → x when x is not an sNaN or changing the payload of a qNaN “preserve the literal meaning of the source code.”

Basically the entire signaling NaN system becomes completely useless if the compiler can just optimize away operations. They surely would not have specified it with the intent to be so useless.

I have yet to meet somebody who thinks sNaN isn’t useless. IEEE 754 itself recommends not using sNaN if you want reproducibility. A signaling NaN is intended to represent an uninitialized arithmetic value, which is why it pretty much always causes a signal to try to use it–if your language has better facilities to handle uninitialized values, than sNaN is pretty damn useless.

Can we get back to discussing to what extent LLVM is willing to expose the underlying hardware behavior for NaN payloads, and promise that its optimizations will not interfere with that? In particular I’d like to understand whether people would prefer to document a wasm-style guarantee that applies to more targets, or a stronger guarantee of “the output NaN is chosen from this set” (or have the latter for targets where it is possible and the former for wasm) – what do you think?

The question of sNaN isn’t entirely irrelevant, since the same things that can cause an sNaN to become quieted can also cause NaN payloads to be adjusted.

But for NaN payloads themselves, it’s a bit difficult to make hard guarantees here. I don’t think it’s wise to specify anything stronger than some hardware we would wish to target would specify. Guaranteeing that the optimizer will propagate NaNs deterministically could be done, but we’d still end up with a rule that “if the hardware propagates NaN, then the semantics are NaN propagation, otherwise :shrug:”, which is a kind of unfortunate rule. There’s still the issue that LLVM-qNaN isn’t necessarily the same as hardware-qNaN (and changing that requires modification to data layout).

jyknight · July 28, 2023, 11:31pm

We don’t really define target implementation details in the LangRef.

I would suggest we say something along the lines of the following:

Any NaN value resulting from an floating-point math operation has a non-deterministic sign, and a payload value non-deterministically chosen from:

all-zero, or
the payload of some NaN provided as input to the operation, or
the set of additional target-specific payload values (empty on most targets).

Note: this wording is intended to constrain the set of NaN payloads that optimizations may create, yet not preclude implementation on targets which create unusual NaN payloads (either in hardware or software math libraries) .

For most targets, there would be no additional target-specific payload values. For sparc, the all-ones payload would be additionally allowed. For the theoretical CPU which puts the address of the faulting instruction into the NaN payload, all values would be allowed via that clause.

programmerjake · July 29, 2023, 12:03am

did you mean “all-zero except with a 1 in the payload’s MSB” (the default qNaN) since that is generated on most platforms when no inputs are NaNs, only old MIPS (and a few other targets unsupported by LLVM such as PA-RISC) would generate all-zeros since that is its default qNaN

Mitch_Alsup · July 29, 2023, 7:39pm

It seems you want to optimize x×1.0 → x and at the same time transform sNaN → qNaN.
If you do this, you still need to raise the exception associated with consumption of the sNaN.

This requires a FMOV that quiets NaNs–something few ISAs have, and you probably don’t really want to need to perform a FMOV anyway–just use the source register again later.

In addition, the user does not see the expected exception.

jyknight · July 31, 2023, 7:19pm

Quite so. I had been thinking about only the non-functional bits of the payload, since we already specified the quiet/signaling bit. But I didn’t say so.

I’d edit the first line of my proposal to:

Any NaN value resulting from an floating-point math operation has a non-deterministic sign, the quiet/signaling bit set as described {above|in section XYZ}, and the remainder of the payload value non-deterministically chosen from:

RalfJung · August 4, 2023, 5:52pm

Yeah I guess in practice it is. Certainly with compilers like clang. But I doubt IEEE spec’d it to be useless, so when they say “consume” they surely meant “the programmer wrote an operation that consumes the value”, not “some completely unpredictable notion that is subject to continuously evolving compiler decisions”.

I also think that would be an improvement, and that’s why I want this guarantee.

I want people to be able to write NaN-boxing code in Rust without having to fix up their NaNs after every single addition. At least on common targets.

I am also observing that the guarantee is already implemented but is not documented, and the guarantee is already relied upon by real code doing NaN-boxing. So we have a contract that both sides already kind of agree on, it’s just not written down. I’d like to see it written down.

Emphasis on “is not an sNaN”.

It is unfortunate but a lot better than the status quo. The status quo is that we have a lot of hardware that behaves very predictably around NaNs, and all the uncertainty comes from the optimizer.

We are considering FP exceptions to be non-observable, so no we don’t need to raise that exception.

The quiet/signaling but isn’t already described elsewhere though. If you want to describe it more precisely it becomes something like:

quiet/signaling bit has the value of some NaN provided as an input, or is set to quiet

But now your spec allows the optimizer to take the quiet bit from one input and the payload from another so that seems kind of pointless. Also we have to be careful the payload doesn’t become all-0 as then it’s not a NaN any more… like, if the one input NaN has payload 0 111..., then we can’t pick that NaNs quiet/signaling bit and the canonical “all-zero” for the rest of the payload. So I think we are back at:

Unless stated otherwise, any NaN value resulting from an floating-point math operation has a non-deterministic sign, and a payload value non-deterministically chosen from:

the “preferred” payload, which has the highest bit (the signaling/quiet bit) set to 1 and the rest to 0,
the payload of some NaN provided as input to the operation, but with its signaling/quiet bit set to 1,
the payload of some NaN provided as input to the operation with its signaling/quiet bit unaltered, or
the set of additional target-specific payload values (empty on most targets).

Among the operations that will “state otherwise” are max and min. (And maximum and minimum, their newer counterparts?)

The main problem with this spec is that on wasm, the set of additional payload values becomes “all quiet payloads” – or we have to clarify that “the set of additional target-specific payload values” can depend on the set of input NaNs for this operation (i.e., that set is not a constant set per target, it is a function set per target that maps input NaNs to set of allowed payloads).

jyknight · August 7, 2023, 2:05pm

We do discuss signaling/quiet propagation behavior briefly in the LangRef Floating-Point Environment section, although it’s not as clear as it could be. For example, I thought we had decided that the constrained intrinsics are required to convert sNaN input to qNaN output as required by IEEE754, but that’s not explicitly stated.

Anyhow, your modified version looks pretty good. We should make sure not to conflict with requirements on determinism of signal/quiet bit, but that could be an aside on the third bullet, e.g. “(Except on constrained intrinsics, since they cannot result in a signaling-NaN output)”

I’m not certain we really need to make such an exception. We might want to check how these operations are lowered in LLVM today, first. We know that the spidermonkey JIT implementation got more complex when it realized this requirement, because it had previously been "bitwise-or"ing operands to create a NaN with an arbitrary payload. But, if that technique hasn’t been used in LLVM IR before today, maybe we don’t really need to allow it to be used in the future…it’d certainly be simpler to not make an exception here.

That seems like a fine answer to me – do you see a particular issue with that? That is, on wasm “If any input operand is a NaN with a non-preferred payload, then an output payload may additionally be any value with the the quiet bit set to 1.”

Topic		Replies	Views
LLVM IR semantics around floating-point NaNs LLVM Project	5	1380	December 15, 2020
Semantics of NaN IR & Optimizations	49	3115	April 14, 2023
Constant Propagation of Floating Point NaN LLVM Dev List Archives	1	99	April 1, 2014
Should NaN payloads be preserved through compilation? LLVM Dev List Archives	5	137	November 9, 2018
[RFC] Extend LLVM IR to express "fast-math" at a per-instruction level LLVM Dev List Archives	48	114	November 15, 2012

Stronger floating-point NaN guarantees

Related topics