Should isnan be optimized out in fast-math mode?

Let me describe a real life example.

There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.

An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function ‘isnan’ became useless. And there are many cases when users complain of this optimization.

I personally would separate the “pre-processing” of the input in a compilation unit that isn’t compiled with -ffinite-math-only and isolate the perf-critical routines to be compiled with this flag if needed (I’d also like a sanitizer to have a build mode that validate that no NaNs are ever seen in this routines).

It could be a workaround. GCC supports ‘#pragma GCC optimize’, which could be used to turn on and off -ffinite-math-only. In clang this pragma does not work, so only separate translation units with subsequent linking, which is not possible in some cases, like in ML kernels.

In general, Krzysztof’s reasoning in this thread makes sense to me, in particular in terms of being consistent with how we treat isnan(x) vs isnan(x+0) for example.

The key point here is what guarantees the user provides to the compiler when they specify -ffinite-math-only. If “NaN never cannot be seen” then indeed, isnan may be optimized out. If “NaNs do not occur in arithmetic operations”, then ‘isnan’ must be kept unless we know for sure that its argument cannot be a NaN. The choice should be based on practical needs IMHO. The second approach is more flexible and enables more use cases.

(Speaking only for myself here, and mostly as someone who doesn’t typically write floating-point-heavy code).

The root issue we have here is that, as with many compiler extensions, fast-math flags ends up creating a vaguely-defined variant of the C specification governed by the “obvious” semantics, and is the case with “obvious” semantics, there are several different “obvious” results.

Given the standard C taste for undefined behavior, it would seem to me that the most natural definition of -ffinite-math-only would be to say that any operation that produces NaN or infinity results is undefined behavior, or produces a poison value using LLVM’s somewhat tighter definition here [1]. This notably doesn’t give a clear answer on what to do with floating-point operations that don’t produce floating-point results (e.g., casts, comparison operators), and the volume of discussions on this point is I think indicative that there are multiple reasonable options here. Personally, I find the extension of the UB to cases that consume but do not produce floating-point values to be the most natural option.

It’s also the case that many users don’t like undefined behavior as a concept, in large part because it can be very difficult to work around in a few cases where it is desired to explicitly override the undefined behavior. For some of the more basic integer UB, clang already provides builtin overflow checking macros to handle the I-want-to-check-if-it-overflowed-without-UB case, for example. And if fast math flags are to create UB, then similar functionality to override the floating-point UB ought to be provided. Already, C provides a mechanism to twiddle floating-point behavior on a per-scope basis (e.g., #pragma STDC FENV_ACCESS, CX_LIMITED_RANGE, FP_CONTRACT). LLVM already supports these flags on a per-instruction basis, so it really shouldn’t be very difficult to have Clang support pragmas to twiddle fast-math flags like the existing C pragmas. And in this model, the -ffast-math and related flags are doing nothing more than setting the default values of these pragmas.

In that vein, I can imagine a user writing a program that would look something like this:

int some_hard_math_kernel(float *inputs, float *outputs, int N) {

{

#pragma clang fast_math off

for (int i = 0; i < N; i++) {

if (isinf(inputs[i]) || isnan(inputs[i]))
return ILLEGAL_ARGUMENT;

}

}

#pragma clang fast_math on

// Do fancy math here…

// and if we see isnan(x) here, even if it’s in a library routine [compiled with -ffast-math],

// or maybe implied by some operation the compiler understands [say, complex multiplication]

// it is optimized to false.

return SUCCESS;
}

I can clearly see use cases where the programmer might wish to have the optimizer eliminate any isnan calls that are generated when -ffast-math is used, but like other UB, I think it is extremely beneficial to provide some way to explicitly opt-out of UB on a case-by-case basis.

I would even go so far as to suggest that maybe the C standards committee should discuss how to handle at least the nsz/nnan/ninf parts of fast-math flags, given that very similar concepts seem to exist in all of the major C/C++ compilers.

[1] I fully expect any user who is knowledgeable about poison in LLVM—which admittedly is a fairly expert user—would expect poison to kick in most of the time C or C++ provides for undefined behavior, and potentially to rely on that expectation.

The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension. If you enable it, you should expect the compiler to diverge from the language standard. I’m sure there’s precedent for this. If I write #pragma once at the top of my header, and include it twice back to back, the preprocessor won’t paste my header twice. Should #pragma once be removed because it breaks #include?

Now, you have a real-world example that uses NaN as a sentinel value. In your case, it would be nice if the compiler worked as you suggest. Now, suppose I have a “safe matrix multiply”:


std::optional<MyMatrixT> safeMul(const MyMatrixT & lhs, const MyMatrixT & rhs) {

for (int i = 0; i < lhs.rows; ++i) {

for (int j = 0; j < lhs.cols; ++j) {

if (isnan(lhs[i][j])) {

return {};

}

}

}

for (int i = 0; i < rhs.rows; ++i) {

for (int j = 0; j < rhs.cols; ++j) {

if (isnan(rhs[i][j])) {

return {};

}

}

}

// do the multiply

}

In this case, if isnan(x) can be constant folded to false with fast-math enabled, then these two loops can be completely eliminated since they are empty and do nothing. If MyMatrixT is a 100 x 100 matrix, and/or safeMul is called in a hot loop, this could be huge. What should I do instead here?

Really, it would be much more consistent if we apply the clang documentation for fast-math “Operands to floating-point operations are not equal to NaN and Inf” literally, and not actually implement “Operands to floating-point operations are not equal to NaN and Inf, except in the case of isnan(), but only if the argument to isnan() is a value stored in a variable and not an expression”. As far as using isnan from the standard library compiled without fast-math vs a compiler builtin, I don’t think this is an issue. Really, enabling fast-math is basically telling the compiler “My code has no NaNs. I won’t try to do anything with them, and you should optimize assuming they aren’t there”. If a developer does their part, why should it matter to them that isnan() might work?

Thanks,

Chris Tetreault

Not sure which way to go, but I agree that we need to improve the docs/user experience either way.
Let’s try to iron this out with an example (this is based on https://llvm.org/PR51775 ):

#include <math.h>
#include <stdlib.h>
int main() {
const double d = strtod(“1E+1000000”, NULL);

This should be covered by the “general function call” rule, is therefore unaffected by -ffinite-math-only, and may validly return inf.

return d == HUGE_VAL;

For this comparison, however, the compiler can assume its operands are always finite. Thus, this comparison results in a poison value (in LLVM IR terminology).

What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?

https://godbolt.org/z/MY73Tf3ee

We could indeed emit a diagnostic (when -ffinite-math-only is in effect) to let you know that you are doing something guaranteed to be incorrect, by using a manifest constant INF, where you promised that you would not.

The proposed documentation text isn’t clear to me. Should clang apply “nnan ninf” to the IR call for “strtod”?

“strtod” is not in the enumerated list of functions where we would block fast-math-flags, but it is a standard lib call, so “nnan ninf” would seem to apply…but we also don’t want “-ffinite-math-only” to alter the ability to return an INF from a “general function call”?

The strtod function should be allowed to return inf/nan. There’s two ways we could accomplish that:

  1. We could specify in LLVM that nnan/ninf are meaningless to most function calls. In this case, Clang may continue emitting it everywhere, as is done today, including on strtod, but it would have no impact.
  2. We could specify that clang should not emit nnan/ninf except on certain calls. In this case, Clang would not emit it on strtod.

I haven’t thought about which option would be better. I’ve been trying to discuss the desired C-facing semantics first.

The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension.

-ffinite-math-only does not need to be a non-standard language extension. Neither C nor C++ requires that floating-point types can represent infinity or NaN, and we could define this flag as meaning that there are (notionally) simply no such values in the relevant types. Of course, that’s not actually consistent with what we currently do, nor with what GCC does.

Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that’d mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.

Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).

I’m in favor. (Perhaps unsurprisingly, as this is precisely the proposal I made earlier, worded slightly differently. :slight_smile:

I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.

The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.

Thanks,

Chris Tetreault

If clang does not remove __builtin_isnan in -ffinite-math-only mode and a user wants calls to isnan be optimized out, they can do it in a literally couple of lines:

#undef isnan
#define isnan(x) false

If clang optimizes out __builtin_isnan and a user wants to check if some float is NaN, they have no appropriate way for that, only hacks and kludges.

Approach that -ffast-math-only means that “there are no NaNs” is too rigid, it prevents several coding techniques, does not provide additional optimization possibilities and provokes user complaints.

I would argue that #undef’ing a macro provided by the compiler is a much worse kludge that static casting your float to an unsigned int. Additionally, you have to re define isnan to whatever it was after your function (let it pollute unrelated code that possibly isn’t even being compiled with fast math), which can’t be done portably as far as I know. Additionally, this requires you to be the author of safeMul. What if it’s in a dependency for which you don’t have the source? At that point, your only recourse is to open an issue with libProprietaryMatrixMath and hope your org is paying them enough to fast track a fix.

Thanks,

Chris Tetreault

It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro isnan is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations.

Influence of libc on behavior of isnan in -ffinite-math-only is also an argument against “there are no NaNs”. It causes inconsistency in the behavior. Libc can provide its own implementation, which does not rely on compiler __builtin_isnan and user code that uses isnan would work. But at some point configuration script changes or libc changed the macro and your code works wrong, as it happened after commit 767eadd78 in llvm libcxx project. Keeping isnan would make changes in libc less harmful.

Without trying to be too harsh, this is the bad justification GCC has
used for years for exploiting all kinds of UB and implementation-defined
behavior in the name of performance. As has been shown over and over
again, the breakage is rarely matched by equivalent performance gains.
So once more, do we even have proof that significant code exists where
isnan and friends are used in a performance critical code path? I would
find that quite surprising and more an argument for throwing a compile
error...

Joerg

The problem is that math code is often templated, so template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs … is going to be in a header.

Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. Just like sometimes you can dereference a pointer after it is free’d, but you should not count on this working. If the compiler I’m using emits a call to a library function instead of providing a macro, and this results in isnan actually computing if x is NaN, then so be it. But if the compiler provides a macro that evaluates to false under fast-math, then the two loops in safeMul can be optimized. Either way, as a developer, I know that I turned on fast-math, and I write code accordingly.

I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:


#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)

I would probably call the macro something else like shouldProcessElement.

Thanks,

Chris Tetreault

The problem is that math code is often templated, so template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs … is going to be in a header.

No problem, the user can write:

#ifdef __FAST_MATH__
#undef isnan
#define isnan(x) false
#endif

and put it somewhere in the headers.

Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”.

Exactly. Attempts to express the condition of -ffast-math as restrictions on types are not fruitful. I think it is the reason why GCC documentation does not use simple and clear “there is no NaN” but prefers more complicated wording about arithmetic.

I was also wrong about reinterpret_cast, sorry. reinterpret_cast<uint32_t>(float) is an invalid construct. The working construct is reinterpret_cast<uint32_t&>(x). It however possesses the same drawback, it requires x be in memory.

The working construct is reinterpret_cast<uint32_t&>(x). It however possesses the same drawback, it requires x be in memory.

We’re getting rather far afield of the thread topic here, but … that is UB, don’t do that.

Instead, always memcpy, e.g.
uint32_t y;
memcpy(&y, &flo, sizeof(uint32_t));

Or use a wrapper like std::bit_cast or absl::bit_cast (https://github.com/abseil/abseil-cpp/blob/cfbf5bf948a2656bda7ddab59d3bcb29595c144c/absl/base/casts.h#L106).

This has effectively no runtime overhead, the compiler is extremely good at deleting calls to memcpy when it has a constant smallish size. And remember that every local variable started out in memory. Only through optimizations does the memory location and the loads/stores for every access get eliminated.

Let’s weigh the alternatives.

We are discussing two approaches for handling isnan and similar functions in -ffinite-math-only mode:

  1. “Old” behavior: “with -ffinite-math-only you are telling that there are no NaNs”, so isnan may be optimized to false.
  2. “New” behavior: with -ffinite-math-only you are telling that the operands of arithmetic operations are not NaNs but otherwise NaN may be used. As isnan is not an arithmetic operation, it should be preserved.

Advantages of the “old” behavior are:

  • " it’s intuitively clear".
  • It is close to the GCC current behavior.

Advantages of the “new” behavior are:

  • isnan is still available to the user, which allows, for instance, validation of working data or selection between fast and slow path.
  • NaN is available and may be used, for instance, as sentinel.
  • Consistency between compiler and library implementations, both would behave similarly.
  • In most real cases the “old” behavior can be easily obtained by redefinition of isnan.
  • It is free from issues like “what returns numeric_limits::has_quite_NaN()?”.

It is unlikely that “old” behavior gives noticeable performance gain. Anyway, isnan may be redefined to false if it actually does.

Intuitive clarity of the “old” way is questionable for users, because it is not clear why functions like isnan silently disappeared or what body should have specializations of numeric_limit methods.

There are cases when checking for NaN is needed even in -ffinite-math-only mode. To make it, users have to use workarounds like doing integer arithmetic on float values, which reduce clarity of code, make it unportable and slower.

Are there any other advantages/disadvantages of these approaches?

If the compiler provides “isnan”, the user can’t redefine it. Redefining/undefining any function or a macro provided by a compiler is UB.

The “old” behavior can be tuned with #pragmas to restore the functionality of NaNs where needed.

The “old” behavior doesn’t have a problem with “has_nan”—it returns “true”. What other issues are there?

isnan does not begin with an underscore, so it is not a reserved identifier. Why is its redefinition an UB?

The standard says so, but I can’t find the corresponding passage in the draft…

From: Serge Pavlov <sepavloff@gmail.com>
isnan does not begin with an underscore, so it is not a reserved identifier. Why is its redefinition an UB?

The standard says so, but I can’t find the corresponding passage in the draft…

I don’t know about C, but in C++ redefining any library name as a macro is forbidden by
https://eel.is/c++draft/reserved.names#macro.names-1

Btw, I don’t think this thread has paid enough attention to Richard Smith’s suggestion: that in fast-math mode, the implementation should

  • treat all quiet NaNs as if they are signaling NaNs
  • treat all “signals” as if they are UB produce an unspecified value
    So, any floating-point operations that IEEE754 guarantees will work silently even on signaling NaNs, must continue to work on any kind of NaN in fast-math mode. But any operation that is allowed to signal, is therefore allowed to give wrong results if you feed it any kind of NaN in fast-math mode. In this model, we don’t talk about specific mathematical identities like “x+0 == x”. Instead, we say “If !isnan(x), then computationally x+0 == x; and if isnan(x), then x+0 is allowed to signal and therefore in fast-math mode we can make its result come out to any value we like. Therefore, if the optimizer sometimes wants to pretend that QNAN + 0 == QNAN, that’s perfectly acceptable.”

Notice that you cannot make “signaling” into actual UB; you must make it produce an unspecified value. If you make it UB, then the compiler will happily optimize

{
if (!isnan(someGlobal)) puts(“it’s not nan”); // #1
double x = someGlobal;
x += 1; // This is a signaling operation

}

into

{
puts(“it’s not nan”); // because if it were NaN on line #1, then either we’d hit that signaling operation, or we’d have a data race
}

But if you just make “signaling” operations produce unspecified values when given NaN, then I think everything works fine and you end up with behavior that’s pretty darn close to what Serge is advocating for with his “New” behavior.

my $.02,
–Arthur