In Rust, we are having a lengthy discussion about what the precise semantics for floating-point operations should be when NaNs are involved. The main problem is that the exact bits in a NaN can differ depending on optimizations and target architectures, which somehow needs to be captured by the language semantics.
During this discussion it was pointed out that WebAssembly basically already solved this problem, in the sense that they precisely defined a reasonable semantics. WebAssembly posits that NaN bits are stable when floating-point values are copied around in memory or between memory and local variables, but that most floating-point operations non-deterministically pick a new NaN payload if the output is a NaN.
Now my question is, what is the behavior of NaNs in LLVM IR? My hope is that it is basically like wasm, in the sense that floating-point values are regular fixed bit patterns (no “magic” weirdness going on when they are just copied around), and that NaN changes may only happen on fmul et al. This would mean that running fmul twice on the same input can produce different results, which is something that optimizations need to consider – it would be illegal to turn (using C syntax here but I really mean the underlying LLVM IR program)
float a = ...;
float b = ...;
float x = a*b; // non-deterministically picks a NaN payload
float y = x; // will definitely have the same NaN payload as x
into
float a = ...;
float b = ...;
float x = a*b; // non-deterministically picks a NaN payload
float y = a*b; // non-deterministically picks a potentially different NaN payload
because in the second program, if a*b results in NaN,x and y could have different payload, while in the first program they always have the same payload.
Is LLVM taking this into account for its optimizations? And if not, then what exactly is the semantics of an LLVM IR program where fmul produces a NaN?
(I should add that my expertise is in formal semantics for programming languages and IRs, but not in floating-point semantics.)
Note that allowing LLVM to change things as it desires is actually part of the rationale behind formulating it as nondeterminism, so it’s possible this limitation is unintentional design/Rationale.md at main · WebAssembly/design · GitHub
LLVM (used in some WebAssembly implementations) doesn’t guarantee that it won’t commute fadd , fmul and other instructions, so it’s not possible to rely on the “first” NaN being preserved as such.
I also think it’s worth keeping in mind that Wasm implementations are very concerned about allowing “fingerprinting bits” to leak through, and thus it wants to support implementations that try very hard to hide the behavior of underlying hardware. Additionally, Wasm is often JIT compiled, which can involve the code being compiled multiple times in a single execution, leading to different behavior as different optimizations are applied.
I don’t think in practice on Wasm, or on any other target, the transformation you’re describing can actually produce different results.
(And IMO LLVM should probably just say it’s target-specific rather than adopting Wasm’s choice of introducing nondeterminism — there’s no reason the IR has to be portable)
Edit: Also, I think Wasm’s spec is potentially non-compliant with IEEE-754 2019: In clause 11 it states
A reproducible operation is one of the operations described in Clause 5 or is a supported operation from 9.2, 9.3, 9.5 or 9.6.
Which includes multiplication. (While clause 11 is mostly concerned with specifying how you might define a -freproducible-math flag or something, I believe the wording here does apply generally, as it’s just explaining what’s under the term “reproducible operation”)
Even on a single architecture, you could get different NaNs if both inputs to an operation are different quiet NaNs since most architectures that I’m aware of (x86 SSE/AVX, ARM, AArch64, PowerPC, SPARC, etc.) pick either the first or second argument as the value they return in that case. If LLVM decides to use a different instruction or the same instruction but swaps the argument order (fadd a, b vs. fadd b, a) then it would return a different result. The only architecture I know of that has special handling to make sure swapping the argument order on commutative operations still returns the same result is x87, which is its own bag of worms.
Then there’s RISC-V, which wisely decided that all FP ops (except fsub, fabs, and other bit manipulation instructions) always return the canonical NaN – quiet bit set, all other IEEE 754 unspecified bits zeroed.
Ah, now that is interesting. For WebAssembly to be soundly translatable into LLVM, they made WebAssembly non-determinisc, but that provides a challenge now when translating from LLVM to WebAssembly. Indeed when both directions should work, the two languages must be pretty much equivalent in what they do. It’s just that LLVM does not spell out very precisely what its floating-point semantics are, we can just infer that indirectly from the kind of optimizations it performs.
That, too, is curious. That would call for a deterministic specification, because any non-determinism in the spec permits “figner-printing” information to leak. Just think of a spec that says that newly allocated memory has non-deterministic content; this allows the allocator to just use whatever bytes the newly allocated region happens to store, which might well be secrets that have not been properly cleared. In contrast, a deterministic spec that says that everything is zero-initialized does not permit such leakage.
So I really do not see how permitting non-determinism can help thwart fingerprinting.
I am more worried about LLVM making a sequence of optimization choices that lead to different results. Like first for some reason it duplicates the multiplication, then it distributes them somehow so they end up in slightly different contexts, then it applies commutativity to one of them – and at that point now we truly do have two different results if fmul is not commutative on the underlying hardware.
Does reproducibility include NaN bits?
So basically, multiplication of NaNs is not commutative. I had not thought about this but it is obvious in retrospect (and the wasm document @thomcc linked above also says this).
So how does LLVM justify commuting the operands of a non-commutative operation? It can’t just say “inspecting the NaN bits is UB”, or can it? Non-determinism seems to be the only other option, and that would preclude duplication.
I think the idea is that one of the valid ways to implement the WASM spec is such that the implementation resists fingerprinting, such as having all FP ops (except sign manipulation), when returning a NaN, always return the NaN with all spec-permitted bits cleared (like RISC-V) even if extra operations are required to accomplish that on the underlying hardware.
Just because the spec permits nondeterminism doesn’t mean the implementation is required to be nondeterminstic.
But this also means that things compiling to wasm have no way to resist fingerprinting, since they do not get any kind of guarantee from the implementation about which behavior will be picked. And it means, the way the spec is phrased, that things compiling to wasm cannot even rely on getting the same result when using the same inputs for a NaN-producing FP operation.
They could have picked a different spec, where the NaN payload is pick in some way that is deterministic, depends only on the input operands, but is not otherwise restricted by the spec – so different implementations could make different choices, but within an implementation the choice must be consistent. I wonder if there is a reason that they did not pick this more restrictive specification.