Should constant folding of NaNs be disabled?

According to IEEE-754 standard, every floating-point datum belongs to one of two disjoint classes:

  • Numbers and
  • Errors.

Numbers are floating-point numbers in mathematical sense. They can be used as arguments in arithmetic operations and function calls. For this class, constant folding is a natural and applicable optimization.

In contrast, the Errors, named in the Standard as “not-a-number” (NaN), are created when a floating-point operation is invalid and no meaningful numeric result exists. A special kind of NaN, the signaling NaN (SNaN) is often used to mark uninitialized values. Since NaNs are essentially error codes, arithmetic operations and mathematical function calls are meaningless for them, these operations just propagates the error from operand to the result.

Constant folding of the expressions that have NaN as operand or result can be meaningless or even unsafe for the following reasons:

First, NaNs represent runtime errors. Compiler can deduce that an operation is invalid and will produce a NaN. In most cases this indicates presence of an error that the frontend did not detect, but which was revealed in low level, for example, due to LTO optimizations. Probably the right behavior would be emitting a warning, but al low level it may be difficult to report where the error arises.

Replacing an instruction that performs an invalid operation with its expected value does not improve performance, as it is likely optimizing invalid code. A user may use the invalid operation intentionally to get NaN. In this case, constant folding is undesirable because the resulting NaN may depend on the target hardware.

If an instruction that produces NaN is executed, it raises an invalid exception. This exception can be caught by a debugger or by the running software (even if the default FP environment is used). However if the instruction is evaluated at compile-time, the exception is not raised, and the produced NaN result silently propagates through subsequent calculations.

Therefore, constant folding in this case hides errors and may create security vulnerabilities.|

Second, the exact representation of a NaN is target-dependent. Some targets support SNaN, while others do not. Canonical NaNs can be different for different platforms. Rules of payload propagation also differ. This variability creates many problems due to mismatch between actual and expected behavior, for instance:

This discrepancy has already been discussed, for instance, here: Semantics of NaN. The attempt to formalize NaN behavior in IR, as documented in LLVM Language Reference Manual — LLVM 22.0.0git documentation does not appear entirely successful, because the difference attributed to hardware anyway remains, as confessed in that document.

Denying constant folding instructions that produce or propagate NaN defers questions about NaN format and behavior to the hardware. That make the IR design more consistent and also make the produced code safer because potential error sources are not hidden.

The possible implementation is provided in: [ConstantFolding] Stop folding NaNs by spavloff · Pull Request #167475 · llvm/llvm-project · GitHub.

Short answer: no. We should continue to constant fold nans.

This only applies in function with floating-point exceptions enabled, which is not the normal case. Folding should only be disabled in cases that would raise an exception.

A NaN does not indicate invalid code. They have a known, defined behavior. You can very well write code depending on those behaviors. The optimizer already does rely on the expectation of nans folding out later in many cases. e.g.,. you pass some constant argument to a function which produces a nan in some sub-path in the callee and the folded nan-case shows the inlining will be profitable.

This is mostly not true. There is only the IEEE-754 2008 layout, where the high bit of the significant indicates quietness. The only exception I know of this is the legacy Mips scheme, which LLVM will never support. The correct way to handle alternative platform signaling nan bit patterns would be to add this information to the datalayout. But given this is only relevant on one dead platform, this work will never happen. For the unsupported signaling nan cases, that should mean treated as-quiet which is not quite the same thing as a representational difference. For the IR, we’ve also largely hand-waved away signaling nan support into the canonicalize intrinsic which you can always implement with software quieting.

The rules don’t differ, only the platform implementation details which we should not be concerned about. The IR follows the approximate nan propagation rules of the IEEE abstract machine, which in principle the hardware is also following. We should not be trying to ensure a match to the concrete behavior of any hardware. If code is relying that specifically on nan payload bits, it is broken and never had that guarantee. Users should have no expectation of bit-identical nan handling across different situations.

This is just a bug for these specific intrinsics. The mode-dependent cases would still only follow the expected behavior for the strictfp case, and the non-strict should follow the expectations for the default fp environment.

This is mostly user error. If the nan is coming out of a non-bitwise operation, there’s no expectation on the sign bit of the nan. We could validly change the constant folding logic to try to preserve the nan bit, but this is still user error.

This looks like it’s working as intended?

This does the opposite; it makes the IR design much less consistent. The IR has abstract semantics that aren’t bound by arbitrary platform decisions, and the backend matches target constructs that conform to those semantics. Each operation has a defined range of permissible outputs for given inputs. Arbitrarily restricting these operations based on potential later codegen behavior is antithetical to the fundamentals of the IR.

3 Likes

Replacing an instruction that performs an invalid operation with its
expected value does not improve performance, as it is likely
optimizing invalid code. A user may use the invalid operation
intentionally to get NaN. In this case, constant folding is
undesirable because the resulting NaN may depend on the target hardware.

I frequently write 0.0 / 0.0 to intentionally get a NaN value because
it’s the easiest way to get a NaN value.

Semantically, there’s just one NaN value, or sometimes a single qNaN
value and a distinct single sNaN value, but this single value ends up
having multiple representations. Which representation you get doesn’t
matter for almost all users.

If an instruction that produces NaN is executed, it raises an invalid
exception. This exception can be caught by a debugger or by the
running software (even if the default FP environment is used). However
if the instruction is evaluated at compile-time, the exception is not
raised, and the produced NaN result silently propagates through
subsequent calculations.

If you’re not in strict-math mode (FENV_ACCESS=ON or similar), exception
state is unspecified. Doing the invalid operation at compile-time
instead of at runtime isn’t the only way we could screw up the
observation of exceptions–we can reorder the instructions with your
environment-modification instructions.

If we’re in strict-math mode, i.e., the operation is a constrained
intrinsic, then we shouldn’t be constant-folding the operation anyways.

I’ll also note that my experience from reading code is that most users,
if they want to guarantee that an operation happens at runtime instead
of compile time, simply use volatile to prevent the compile from
optimizing it away. I don’t think there are any users that expect all FP
expressions to happen at runtime if not in strict-math mode.

Second, the exact representation of a NaN is target-dependent. Some
targets support SNaN, while others do not. Canonical NaNs can be
different for different platforms. Rules of payload propagation also
differ. This variability creates many problems due to mismatch between
actual and expected behavior, for instance:

Almost every architecture already uses the same preferred NaN [1]
output, up to sign bit, for which there is pretty wide variance. We
already have a weak NaN propagation guarantee that we won’t generate any
other NaN payloads than the common preferred NaN [up to sign bit] which
is generally sufficient for software. There’s no universal propagation
algorithm either: some architectures propagate the first operand, some
propagate the second operand, some propagate neither.

If you care about payload propagation, then fadd and fmul become
noncommutative, which is a much bigger impact on optimization. So
there’s already a situation where there’s a mismatch between the
hardware rules and the compiler’s rules here, even without invoking
constant folding.

This discrepancy has already been discussed, for instance, here:
Semantics of NaN
https://discourse.llvm.org/t/semantics-of-nan/66729. The attempt to
formalize NaN behavior in IR, as documented in LLVM Language Reference
Manual — LLVM 22.0.0git documentation
https://llvm.org/docs/LangRef.html#behavior-of-floating-point-nan-values
does not appear entirely successful, because the difference attributed
to hardware anyway remains, as confessed in that document.

Our preference is to say nothing about NaN payloads at all. But there
are some applications that do NaN stuffing where if you know that the
computation is only ever going to produce a single NaN representation,
you can save on a call to if (isnan(val)) { val = PREFERRED_NAN; }
modification routine. (The JS engine is one such application, see
Value.h - mozsearch).
This isn’t possible for every architecture–Sparc doesn’t produce the
correct representation, and WASM doesn’t give any guarantees here. So
the semantics that we have are a compromise which give pretty close to
the minimum possible guarantee we can give–if the hardware gives you
some particular guarantees on NaN behavior, we won’t interfere with
those guarantees.

I can see an argument for adding preferred NaN encoding to the
datalayout and using that in constant folding, but I think the benefits
don’t really justify the effort it takes to make that work.

[1] I prefer the term “preferred NaN” over “canonical NaN” to avoid
confusion with the canonicalize operation, which doesn’t affect the NaN
payload.

Floating-point exceptions are raised unconditionally, regardless of the FENV_ACCESS setting. An operation that produces a NaN always raises the invalid exception. A user can install a signal handler for FP signals and enable traps for invalid exception. In this case, any occurrence of a NaN would be caught, and the location of the offending instruction would be known. This method does not relies on any programming language guarantee, but is based on the processor features.

This method is not perfect, - speculative execution may cause false-positives. However, it provides a practically usable way to locate where an invalid operation occurs. With constant folding, the invalid instruction disappears and the resulting NaN is spread on its uses, this complicates finding the root of the problem.

Without FENV_ACCESS ON compiler can reorder instructions, but for the purpose of invalid operation detection it does not matter, only the fact of invalid instruction is sought.

Sure, it can be used as a sentinel value, for example, this does not make the code invalid in this case. But such NaN is used only in comparisons, which do not propagate the NaN. There is an analogy with nullptr, which is a pointer but not proposed for dereferencing.

By design, NaN is proposed to represent errors, see IEEE-754:

6.2 Operations with NaNs

Two different kinds of NaN, signaling and quiet, shall be supported in all floating-point operations.
Signaling NaNs afford representations for uninitialized variables and arithmetic-like enhancements (such as
complex-affine infinities or extremely wide range) that are not in the scope of this standard. Quiet NaNs
should, by means left to the implementer’s discretion, afford retrospective diagnostic information inherited
from invalid or unavailable data and results. To facilitate propagation of diagnostic information contained
in NaNs, as much of that information as possible should be preserved in NaN results of operations.

It does not mean that NaN cannot be used in other ways, but its properties are specifically designed to support just this use case. Therefore, the use cases beyond error representation could be considered as non-typical, and lack of optimization in these cases could be considered as acceptable.

Intel follows IEEE-754 recommendations and outputs NaN with the same payload as the input NaN while RISCV outputs default NaN. It means constant folding must be aware of the platform used and still it has no full information about the payload.

IIUC, none of these problems would exist if constant-folding of NaNs were disabled?

The documentation contains complain:

Unfortunately, due to hard-or-impossible-to-fix issues, LLVM violates its own specification on some architectures:

First, if this happens in source code, the compiler can evaluate such expression. Even if the expression is evaluated in runtime, no problem, it is intended behavior. There is no problem of finding the place where NaN appears. It contrasts with an invalid expression which is obtained as a result of LTO optimization.

Typically that is true. In rare cases when payloads are really used, constant folding may break the intended behavior.

The proposal is about excluding dependence on any such guarantee, weak or not. All these things like preferred NaNs, payloads, sign bits are meaningful only in the context of constant evaluation. If the latter is disabled and handling NaNs is performed in runtime, IR get rid of these target-dependent problems.

As NaN represents an error (by design), and fadd got both operands as errors, it does not matter, which error would be propagated. Error handling does not need to be deterministic, it is not a normal workflow.

It is an interesting example. In theory all NaNs are representation of semantically the same value, but in this case it is not so. This is however a virtual machine, it is “runtime”, it cannot pass problem down the stack. In contrast, compiler have such possibility, which we could use.

This becomes ambiguous when there are multiple inputs to choose from. And it’s merely a recommendation, not mandated behavior meaning you cannot write portable code that depends on this property, and I do not think this is a problem that needs solving.

I believe this is specifically about x87, which is unimplementable.

We can avoid a lot of optimization bugs if we just start deleting optimizations, but that’s generally not a good bug fixing strategy. I would rank the subnormal handling issue mentioned there as more important than the nan handling and is a separate problem.

Floating-point exceptions are raised unconditionally, regardless of
the FENV_ACCESS setting. An operation that produces a NaN always
raises the |invalid| exception. A user can install a signal handler
for FP signals and enable traps for |invalid| exception. In this case,
any occurrence of a NaN would be caught, and the location of the
offending instruction would be known. This method does not relies on
any programming language guarantee, but is based on the processor
features.

If we’re being pedantic here, traps are UB in C anyways. And FENV_ACCESS
OFF is UB if the environment is ever non-default. Enabling FP traps are
therefore UB without FENV_ACCESS ON in C.

The documentation contains complain:

|Unfortunately, due to hard-or-impossible-to-fix issues, LLVM violates
its own specification on some architectures:|

As Matt mentions, this is in reference to x87 FPU. (Which isn’t
impossible to fix–the right code sequences we need to generate are
known–just extremely involved for the benefit we’d get, especially as
LLVM defaults to SSE math even on 32-bit x86.)

We can avoid a lot of optimization bugs if we just start deleting
optimizations, but that’s generally not a good bug fixing strategy. I
would rank the subnormal handling issue mentioned there as more
important than the nan handling and is a separate problem.

Agreed.

I would not be opposed to encoding preferred NaN in the data layout and
using that for constant-folding (which would fix pretty much all of the
most user-visible issues around NaN payload nondeterminism). But I’m not
signing up to do that work myself, and it ranks pretty near dead last on
my list of FP issues to tackle, probably even below fixing x87 FPU code
generation. LLVM’s denormal flushing story is much more problematic and
the source of much more vocal user complaints than denormal handling,
and that is much more deserving of our attention than NaN payloads.

I agree with the comments from arsenm and jcranmer. The required semantics of NaN for LLVM IR was extensively discussed, and I continue to agree with what we decided and wrote in the LangRef.

I did mention on a previous thread that there is a minor bug in the current definition: the spec states that constrained FP functions must not return an sNaN even if an sNaN is provided as input (in contast to the non-constrained operations, which may). That requirement, when combined with the description of the quiet/signaling bit as being always present, is not reasonably implementable on a platform where the hardware’s instruction-set does not implement sNaN semantics (treating all NaN payloads as quiet).

It was not the intent behind the wording to prohibit architectures without sNaN support. Rather, the intent was to prohibit the compiler from eliminating the NaN-quieting part of an operation where it exists. But we should clarify that detail, potentially by just stating that certain platforms do not have sNaN semantics, and treat all NaNs effectively as-if quiet.