RFC: Optional support for signaling NaNs

Motivation

The current implementation of strictfp functions uses the same unified model for almost all cases that deviate from the default mode. These include, in particular:

• Code that executes with a non-default rounding mode,
• Code that inspects the FP exception status,
• Cases where FP exception trapping is enabled,
• Situations where a user wants to distinguish between quiet and signaling NaNs.

These cases impose different limitations on code transformations, so a single model suitable for all use cases is possible only at the cost of severe performance loss. This is a challenge, since many (if not most) practical applications of a non-default FP environment also require maximal performance.

The problem could be mitigated if a user specified which behavior is needed. Such a possibility exists, as there are a number of options that do exactly this, - -frounding-math, -ftrapping-math and some others. So if no -ftrapping-math is specified, the compiler could assume that FP trapping does not happen and generate more efficient code. Currently, this is not possible because these options in Clang do not specify any special behavior - they are implemented as setting up strict exception behavior.

A more flexible implementation would use information about user’s intentions, deduced from the options like -ftrapping-math and others, to generate more efficient code. To reach that, the command-line options must be implemented as separate independent features, which would determine a particular aspect of FP model, and a user could combine these options to reach desired behavior.

As a first step in this direction could be the implementation of support for signaling NaN as a separate feature, which is the simplest case as it does not involve interaction between instructions. Currently, this support is tightly coupled with the strictfp implementation, which does not look as a best decision.

Problems

This code represents a correct function, provided that the compiler and hardware are IEEE 754 compliant:

void quiet_SNAN(float *ptr, unsigned num) {
    for (unsigned i = 0; i < num; ++i) {
        *ptr *= 1.0;
    }
}

Thus, a user might expect that this code would compile and work as expected on X86. Indeed, Clang can compile this code, but only with an option like -ffp-model=strict. This is strange because the code does not use non-default control modes or read FP status.

Because signaling NaN support in Clang is tightly coupled with the strictfp function attribute, it is difficult to document how to use it. The strictfp attribute is an internal detail, end users need not be aware of it. So how a user could enable this support on? There are several pragmas and half a dozen of command-line options that enable sNaN support, but none of them have a clear connection to sNaNs. Unsurprisingly, the Clang User Manual does not mention signaling NaN support, making it an undocumented feature.

The coupling two unrelated thing - sNaN support and the strictfp attribute - creates problems in other places too. For example, #pragma STDC FENV_ROUND in general does not require access to the FP environment and can be used in the default mode, where sNaNs are unsupported. On a target that does not have static rounding support, this pragma is implemented using dynamic rounding, which requires the strictfp attribute on the function. As a result, using the pragma would unexpectedly turns on sNaN support.

Another problem arises from targets that do not support sNaNs. On such a target, the compiler would try to honor sNaNs, producing less efficient code. This is especially bad, as such targets are often represented by ML or graphic cores, where performance is particularly important.

Control over sNaN support has a strong influence on the optimization of strictfp code. If the user does not use sNaNs, or if they are unsupported by the hardware, this knowledge can assist in producing efficient code in strictfp functions. Many operations in this case do not raise exceptions or are even pure functions.

Proposal

Support for signaling NaNs should be untied from the strictfp attribute. A new special attribute would represent this support, and it could be used in both strictfp and default mode functions. A special command-line option should be available to users to manage thise support. It could be -fsignaling-nans, an option available in GCC for this purpose.

The attempt to introduce the dedicated control over sNaN support is taken in https://github.com/llvm/llvm-project/pull/193055.

Any feedback is appreciated.

Most people I spoke to seem to agree that signaling NaNs were a mistake and should never have been introduced. So based on that, there’s one section that’s missing from your RFC – the motivation. Why do you care about signaling NaNs? What problem are you actually trying to solve? (You have a motivation section but it doesn’t actually contain a motivation, i.e., an explanation for why you care about this.)

What do you mean by “correct” here?

Personally, what clang does aligns very well with my expectations. Clearly yours are different. Please don’t assume everyone has the same expectations and instead explain your expectations. :slight_smile:

I’ll repeat the essence of the comment I left on the PR:

The status quo is that distinguishing sNaN from qNaN is supported only for the constrained intrinsics – and for those, sNaN is always supported.

I don’t think it’s useful to add support for sNaN in non-strictfp functions. While it’s theoretically feasible, I can’t see any value in it – it just seems unnecessary work. What we certainly do require is the ability to say you don’t care about sNaN, when using a constrained intrinsic (in a strictfp function).

As you point out, it is common to not need sNaN handling, even if you care about other FP environment features like dynamic rounding modes. The constrained intrinsics we have today already allow specifying – independently – whether you want to support dynamic rounding modes or floating-point exceptions. We just need to add the missing option which allows users to specify whether they need sNaN handling or not – in the same manner as the rounding and exception-handling options are specified.

Maybe signaling NaNs is a design mistake, or maybe not, different opinions may be found. Some users would like to use sNaNs to make their software more robust. Other would like to use the ability to catch the access. It is a part of standards, and users may invent any applications of this feature. It is important that sNaNs exist in hardware and the compiler cannot ignore their existence.

As for the problem I try to solve, it is the performance of strictfp code, as I tried to describe in the motivation section. Many limitations imposed by the different aspects of the interaction with the FP environment makes the optimization of strictfp code inefficient. On the other hand, in real life it is unlikely that a code makes rounding mode manipulations, status flag reading and generation of FP traps at the same time. If we cannot optimize code in general form, let’s provide a user with a set of switches, which would allow the compiler to ignore unneeded restrictions.

Signaling NaNs are important for optimizations of strictfp code because most operations raise Invalid exception when encounter sNaN. For example, floor raises exceptions only if sNaN is its argument. If the hardware does not support sNaNs or they are absent in the processed data, floor is a pure function and can be placed anywhere, for example, between a producer and a consumer of FP exception:

%a = call float @llvm.sin.f32(float %x)
%b = call float @llvm.floor.f32(float %y)
%c = call i32 @fetestexcept(i32 %fe_invalid)

If sNaNs can produce Invalid exception, this code becomes invalid. In this case more restrictions are imposed on placement of floor, which may have negative influence on the performance.

Yes, my fault, it should have been “standard conformant”. Signaling NaNs are mandatory feature of IEEE 754 and optional feature of the C standard.

The expectations of a user is based on standards and their feasibility by hardware :slight_smile:

I provided several reasons why this tight coupling of sNaN support and the exception handling is not the best idea.

There is a discussion about the support of trapping in Fortran: https://discourse.llvm.org/t/rfc-flang-add-floating-point-trap-handling-support. If the existing solution based on the constrained functions would not provide acceptable performance, enabling trapping in the default mode could be a solution. There was also a discussion on similar topic, also inspired by user feedback: Support of trapping math. If trapping support would be modified, the signaling NaN control might be useful.

Where this knowledge would come from?

If sNaN support is absent in the hardware, such an option is excessive, as it may have only the same value. If the hardware supports sNaNs, but a user wants to ignore their existence, it is a user promise, something like -ffast-math makes about NaNs and infinities. If a code with such a promise is mixed with a code without it, the promise is not valid anymore, because there are data in the function that may be sNaN. It looks like absence of sNaN is a function attribute rather than per-instruction.

Agree! This is a problem we need to solve.

But definitely not that. Our default mode assumes that traps cannot happen, and optimizes based on that. It makes no sense to allow enabling traps in the default mode – we have the fpexcept.maytrap mode for that; and we ought to fix its performance.

  1. There is a desire to unify constrained and non-constrained intrinsics. In order to do that, we need to ensure that all the “extra features” the constrained intrinsics give you today are optional. You almost have that today, since you can specify rounding.tonearest fpexcept.ignore on a constrained intrinsic, which is close to the semantics of the non-constrained operations. But what you cannot do is specify that you don’t require strict sNaN handling – that’s always included.
  2. There are few users who need full sNaN support. There’s very very few who need full sNaN support but without support for FP exceptions/traps. There are many more users who care about performance of code which enables dynamic rounding modes or FP exception support. We should therefore prioritize optimizing the latter, over the former.

Every one of the float options are user promises. “User promises that the dynamic rounding mode is currently tonearest”; “User promises they don’t care which FP exception flags are raised by this operation”, etc.

But, yes, the (current) default sNaN behavior can be considered somewhat similar to fast-math-flag nsz (not nnan/ninf); you get a restricted non-determinism. Documented in “LangRef Behavior of Floating-Point NaN values”.

That conclusion doesn’t follow. Value flow is not related to IR function boundaries. And, as mentioned above, it’s not UB to pass an sNaN to a non-“strict-sNaN” operation.

It is not as straightforward as it initially appeared. Consider the transformation:

sin(X) / cos(X) -> tan(X)

This transformation is performed by InstCombine: llvm-project/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp at d24638888a16010c79bf7c45174b6afcc5b51671 · llvm/llvm-project · GitHub. It replaces three FP operations with one. While it is somewhat complicated by fast-math flags, it is essentially simple.

If these were constrained functions, things would be more complicated. First, consider the effects of rounding mode, these are independent of exception.

  • The operation cannot be performed if any of the 3 operations have the an assumed rounding mode different from the others.
  • If all three operations have dynamic rounding, it is not sufficient to combine them. We must ensure that on the path from sin and cos there are no operations that could change the rounding mode. Since the compiler can reliably determine this only for intrinsic functions, any other function would be considered as potentially changing the rounding mode, and the transformation would be blocked.

Similar considerations should be applied to any control mode, such as denormal behavior.

Now consider exception handling.

  • If trapping is possible, this transformation is not valid because the FP status bits can be observed at the points after each FP operation if they trigger exceptions.

Currently, a strictfp function always implies trapping, with no way to disable it. Even if we could disable trapping, limitation would still exist:

  • if between the division and sin or cos there is a function that reads the status flags, the transformation becomes pointless, - the call to sin (or cos) cannot be removed because the status flags could be distorted at the point of reading.
  • If X==pi/2, the original expression would raise DivideByZero, but the transformed expression would raise Invalid.

Therefore, in the strictfp environment, this transformation should not be applied. The same holds for many other transformations, the restrictions are too numerous. This can lead to significant performance loss.

However, we could improve the performance of strictfp code by loosening the restrictions. If the compiler knew that trapping is disabled, or that the rounding mode may be changed only by a limited set of functions, or that sNaNs cannot appear in the data, or that one error exception may be replaced by another, and so on, the chances for optimization become higher. The strict restrictions are too rigorous for many users.

The initial hope, that we can implement transformations for constrained functions and thus solve the performance problem, seems to be false. Other solutions have to be found, perhaps by providing users with a set of options to tune optimizations in strictfp functions.

Users, who don’t care about sNaNs would benefit from -fsignaling-nans because it allows them to inform the compiler that it does not need to honor sNaNs, which might improve performance.

Support for sNaNs in the default mode is required because we need to convert strictfp function into non-strictfp and vice versa. This support is necessary for implementing #pragma STDC FENV_ROUND and for some optimizations. Semantics must not change when the internal representation changes.

Hardly anyone cares about sNaNs. There’s a reason the C standard made support optional. It would be very strange to require basically everyone to set such a flag. The default should be aligned with the need of the majority of users, and that is fairly clear in this case – sNaNs should be ignored by default.

It is fair to ask for the sNaN usecase to be better supported on an opt-in basis. But if you insist that it should be the default (i.e., if you put your needs over those of the majority) you are unlikely to have much success.

1 Like

The initial hope, that we can implement transformations for
constrained functions and thus solve the performance problem, seems to
be false. Other solutions have to be found, perhaps by providing users
with a set of options to tune optimizations in strictfp functions.

I don’t think it’s been found to be false. By and large, we just haven’t
attempted to actually implement the needed transformations for
constrained intrinsics. There’s a couple of reasons for this, but I
don’t think there is any dispute that the constrained intrinisics should
be optimizable, we just lack the effort put into actually making the
optimizations actually happen.

If all three operations have dynamic rounding, it is not sufficient
to combine them. We must ensure that on the path from sin and cos there
are no operations that could change the rounding mode. Since the
compiler can reliably determine this only for intrinsic functions, any
other function would be considered as potentially changing the rounding
mode, and the transformation would be blocked.

We can model FP control word behavior as a particular kind of memory
location (indeed, there’s a PR for this somewhere), and that automates
most of the tracking behavior automatically. Obviously, we don’t know
the effects of calls to unknown external functions, but I would hazard a
guess that most numerical code that is amenable to compiler optimization
isn’t going to have a lot of unknown function calls in the middle of an
optimizable expression that makes an unknowable barrier. And in any
case, strictfp + “assume default rounding mode” is definitely a
combination we want to support anyways.

As James says, it may make sense to have sNaNs optionally part of the
“things you don’t care about” in strictfp mode. But caring about sNaN in
non-strictfp mode is a lot more headache for very little gain.

We don’t know this for sure. There are users who would like to use them, but the available support does not allow it. Yes, this feature is not needed for everyone, but it is part of a standard that has existed for decades. Almost all big cores support sNaNs.

I came across a statement, that this was a political decision. Maybe the poor support for sNaNs blessed by the C standard has led to the lack of interest in this feature. Anyway, the C standard has also supported this feature (as optional) for a long time and it would be better to align Clang’s support with the requirements of the standard.

Absolutely agree.

This is what the goal of this proposal (and PR193055). It proposes that sNaN support be controlled by a dedicated option, which is off by default. Currently, the sNaN support is always enabled in strictfp functions and the primary purpose of the proposal is to provide a way to disable it in such functions.

The implementation in PR193055 enables sNaN support in strictfp functions by defaultby default to maintain compatibility with Clang’s current behavior. We could go further and always require -fsignaling-nans to enable sNaN support. It would be a breaking change, but it would make the sNaN support a truly orthogonal feature. If a few users rely on sNaN support, this change would be painless.

I would like the opposite, - to have the ability to disable the sNaN support in strictfp functions. As for supporting sNaNs on an opt-in basis, - absolutely agree, this is what clang lacks now.

I have spent some time trying to adapt InstCombiner to support of non-constrained intrinsics whithin strictfp functions, as proposed in https://github.com/llvm/llvm-project/pull/188297. The transformation sin(X) / cos(X) -> tan(X) shown above is quite typical. Strictfp function imposes additional restrictions on FP operations, but not all of them are required in every case.

Anyway, we need to develop ways to help optimize strictfp functions. Obviously, optimizing constrained functions is a more difficult task. Better accounting for users’ intentions can lead to improved performance.

As I understand it, this is already implemented in this way. It helps prevent from disallowed reordering, but cannot help if the order is already unfavorable. For example, consider the code:

%a = call float @llvm.experimental.constrained.sin.f32(float %x, metadata !"round.dynamic", ...)
%c = call i32 @fesetround(i32 %rm)
%b = call float @llvm.experimental.constrained.f32(float %y, metadata !"round.dynamic", ...)
%d = call float @llvm.experimental.constrained.fdiv.f32(float %a, float %b, metadata !"round.dynamic", ...)

This cannot be folded. However, to realize that we would need to scan all instructions between sin and fdiv. Currently, no tool exists for performing such a scan, it would need to be created.

The compiler cannot assume that, because users may use wrappers around functions like fesetround or fetestexcept. However, if there were an option that only standard functions may access FP environment, then such optimizations would become possible, as it would be the user’s responsibility to guarantee that.

As we currently interpret strictfp property, it does not assume any default rounding mode. If you mean that the code does not change the rounding mode, yes, this is an important particular case, which should be supported, but which is not supported yet.

The main reason why sNaNs should be supported in the default mode as well is the need to support optimizations that change the strictfp-ness of a function. For example, consider the case you mentioned above, when a function does not contain instructions that may change rounding mode. In this case, the function could be converted to a function without strictfp attribute. This would be profitable for optimizations. The reverse example is the support of the pragma FENV_ROUND on targets without static rounding. It requires assigning the attribute strictfp to a function that initially does not have it. Changing internal representation should not change user-visible semantics.

Supporting sNaNs in all modes would make sNaN support an orthogonal feature, enables some optimizations, improves standard conformance andand free us from the need to describe complex and vague rules about when sNaNs are supported and when they are not. It does not require enormous efforts and there are users who would benefit from such support.

Ah okay, I misunderstood then – sorry for that.

I’d also note that this transform is only permitted when the fast-math-flag ‘reassoc’ is enabled, because it’s value-changing. So…I don’t think this is the best one to start a discussion with, because it’s not at all clear to me how FMF ought to interact with FP exceptions modes.

One possible answer is that “fpexcept.strict” effectively overrules “reassoc”, and prohibits transforms which might change the FP exceptions raised (even when the change in the flags is consistent with the change in the value permitted by reassoc)! But, that’s probably unnecessarily strict, it’s probably more reasonable to say that if reassoc permits a transform to change the value computation, changing the exceptions which are raised by the computation is also permitted if (and only if) the new flags which are raised are consistent with the new value computation.

But that’s really a whole other unrelated discussion…

The real point you’re tryinb to make in this section is that we would need to implement additional checks in order to correctly optimize strictfp constrained intrinsics. And I agree with that: such checks are required, in order to correctly optimize strictfp code. But: such optimizations are still possible, and we can and should do them.

This isn’t correct. We have the three options “fpexcept.ignore”, “fpexcept.maytrap”, and “fpexcept.strict” for the constrained intrinsics. Specifying “fpexcept.ignore” “rounding.tonearest”, and (the not-yet-existing flag) “non-strict sNaNs” on a constrained operation in a strictfp function can theoretically permit many of the optimizations permitted in a default non-strictfp function.

In particular, for the “ignore” mode, traps are assumed to be disabled, and flags are assumed to not be read per docs.

The primary restriction remaining (compared to a non-constrained equivalent in a non-strictfp function), is that we must ensure that we don’t move such an operation into a region of code where traps may be enabled, or which might set status flags deterministically and subsequently read them. But that still permits many correct transformations (which we don’t implement today).

I don’t think this is correct, because…

…we already have that, in the “fpexcept.ignore” option. It’s just not yet used to enable relevant optimizations yet.

Not needing to honor sNaNs is already the status quo in the default mode. I agree we should implement such a flag to permit strictfp functions to also not honor sNaNs.

While we do currently change sNaN semantics when we convert non-strictfp instructions into their strictfp equivalent, it’s not a problem. The non-strictfp semantics specify limited non-determinism for sNaN values, and the semantics for strictfp constrained operations fit are a subset of that permitted non-determinism.

We don’t need to add support for sNaN in the default mode in order to support such conversions correctly.

We do know this for sure. This is a feature which was specified by IEEE754 and implemented in CPUs decades ago. Yet, next to nobody uses it, or comes to toolchain or language spec authors with a use-cases where it’d be important.

Really, the singular situation sNaNs have been found useful in the past is as a limited form of memory sanitizer instrumentation. Some fortran compilers have the option to initialize floating-point locals to sNaN and enable FP invalid traps, in order to detect use of uninitialized variables of floating-point type (but not variables of other types!).

Detecting uninitialized variable is certainly a valuable debugging feature, but in modern times this feature is typically implemented for all datatypes via compiler instrumentation (e.g. msan) or binary instrumentation (e.g. valgrind), rather than with sNaN. I’d additionally note that this use-case requires enabling both sNaN and traps, so it isn’t evidence for the need to support sNaN in the default mode, which doesn’t support traps.

Yes, this is not a good example. Another one could be minimum(X, Y) * maximum(X, Y) => X * Y. It cannot be folded in a strictfp function because, in the original code, either minimum or maximum would signal if X or Y is a signaling NaN, whereas after the transformation the division would signal. It is not as impressive as the sin/cos/tan transformation, however.

It actually depends on the exception model being used. Currently, the model implemented by constrained function is very strict. It assumes exceptions are raised in the same sequence and in the same places as if the program were executed on the abstract C machine. This model is ok if traps are enabled and the trap handler inspects the code that threw the exception, but for many other applications, it is too rigid. I think we need some way to specify or deduce the required exception model, but this is a separate problem.

Perhaps fast-math could be allowed in strictfp function. Observing exceptions or setting rounding mode can, it seems, be combined with imprecise math, at least in some cases. This would reduce the performance gap between strictfp and non-strictfp code. But you are right, this is an unrelated topic.

Sure, constrained intrinsics can and should be optimized. However, there are fewer opportunities for such optimizations than in the default mode, and this is a concern.

“fpexcept.ignore” does not solve the problem. Consider the code (which I think is a typical use of exceptions):

feclearexcept(FE_ALL_EXCEPT);
...
// do calculations
...
except = fetestexcept(FE_ALL_EXCEPT & ~FE_INEXACT);

Even LLVM has similar code: llvm-project/llvm/lib/Analysis/ConstantFolding.cpp at fc4aad7b5db3fff421df9a9637605b9ca5667881 · llvm/llvm-project · GitHub.

In this case trapping can be disabled (and usually is). However, exceptions are not ignored. This is an important case because, in the absence of trapping, exceptions can be observed only at specific points, such as calls to llvm.get_fpstate or fetestexcept. Between these calls and feclearexcept FP operations can be reordered with fewer restrictions.

It is not a problem, because code that enables and disables traps must have side effects, since the trap mask is part of the control mode set. The real problem is that traps are often enabled for an entire application, because in hardware running with traps enabled has zero cost. This does not agree with the current implementation in LLVM, which require strictfp mode to enable trapping, but that is a separate problem.

Actually there is no indeterminism in sNaN behavior. It was introduced into LLVM documentation because there was no way to determine if sNaNs are supported. With the proposed command-line option and function attribute, the indeterminism is not needed anymore. Or, more precisely, the only remaining indeterminism is payload value, which I believe is purely runtime aspect. The absence of guaranties associated with this artificial indeterminism actually makes sNaN support useless - users who would like to use sNaNs cannot expect any definite behavior.

Sure. How they could appear if sNaNs are actually unsupported in both GCC and Clang. If a compiler supports sNaNs it either defines the macro __SUPPORT_SNAN__, which means the support according to the standard or documents the support in its user documentation. GCC defines __SUPPORT_SNAN__, but only when -fsignaling-nans is specified, which for a quarter of a century has remained an experimental option. Clang does not define the macro and does not document the support. In both compilers, the sNaN support, strictly speaking, is experimental. How a user can rely on an experimental feature for a serious purpose, like making a program more robust?

Searching for signaling nan in LLVM issues finds quite a few issues, some of which were closed as won’t fix. This may indicate user interest in this feature.

Also, LLVM is a framework developed not only for C. Fortran, another important language for numerical calculation, also declares support of IEEE 754, but has no statement about optionality of sNaN support, at least in Fortran 2008.

The only objection to implementing sNaN support as an independent facility I habe encountered so far is the claim that supporting it in the default mode requires extra efforts. To support sNaNs we need to fix optimizations that replace a floating-point expression with one of its arguments, as these replacements behave differently for sNaNs than for qNaNs. In all other cases, the difference between the NaNs is not visible for the compiler, please correct me if I am wrong. There are few such transformations: add/sub 0.0, multiply/divide by 1.0 and min/max. These transformations must be updated anyway to account for the optional nature of sNaN support. We could implement the same update for default-mode intrinsics as well (and we should for using operand bundles). In LLVM this code common for regular instructions and for constrained intrinsics, except for min/max.

Support of sNaN as an independent feature (irrespective of strictfp) has the advantages:

  • It is standard-conformant, no additional documentation required.
  • It is simple as a concept, no complex rules when it is enabled and when it is not.
  • It is simpler to implement and does not require artificial indeterminism.

If we choose to support sNaNs only within the strictfp functions, we would need to do at least:

  • Document the behavior in the User Manual. References to the internal implementation details, intended for compiler developers, are not suitable,
  • Implement a mechanism that preserves sNaN semantics throughout transformations that changes strictfp-ness of functions. We know that such transformations will appear and relying on indeterministic behavior is not a suitable option.

That’s incorrect. It was introduced in LLVM so that we can apply optimizations like x * 1.0 -> x without having to know whether x is an sNaN or not. More people care about such optimizations than about sNaN, so the choice was made to prioritize the optimizations.

1 Like

This makes sense for default-mode programs. If the program does not call issignaling on FP values, its behavior will not change. However, if FP exceptions are observed, the indeterminism described in LLVM Language Reference Manual — LLVM 23.0.0git documentation would mean that the Invalid exception could be raised or dropped in an unpredictable way, or raised in the wrong places. This would make exception reading useless.

If exceptions are observed, sNaN behavior must be deterministic.