[RFC] A consistent set of semantics for the floating-point minimum and maximum operations

(To keep things simpler, this post is fully under the assumption of non-constrained FP.)

Let me start with the facts. Please correct me if any of these are incorrect:

  • LLVM semantics allow treating sNaN as qNaN and allow omission of canonicalizing operations.
  • Prior to these changes LLVM documented that sNaN in minnum is guaranteed to behave the same as qNaN.
  • The LLVM middle-end relied on these semantics in optimizations.
  • The LLVM back-end did not respect the documented semantics on many targets, including:
    • On most targets that have a native instruction with 2008 sNaN semantics, it was used without forced canonicalization.
    • If libcall legalization was used (which can happen even on X86 when targeting -Oz) the underlying libc implementation might use 2008 sNaN semantics.
  • libcs implementations may or may not use 2008 sNaN semantics. Recent versions of glibc do, musl doesn’t.
  • The C standard (including annex F) does not mandate 2008 sNaN semantics.
  • There are some IR producers who want the semantics LLVM previously documented (e.g. Rust).
  • There are some IR producers who want to avoid the overhead of sNaN quieting that is required to achieve the previous semantics on targets that support the 2008 sNaN semantics in hardware.

I think the conclusion from these constraints is that:

  • We need an operation that matches the previously documented semantics (i.e. sNaN behaves like qNaN). This matches minimumnum semantics modulo the signed zero handling.
  • We need an operation that may or may not have the 2008 sNaN semantics. This matches the semantics of fmin in C.
  • We should not and can not have an operation that is guaranteed to have the 2008 sNaN semantics.

I think this roughly matches where LangRef is right now, though it does not match the implementation:

  • minimumnum matches our old documented semantics, with the nsz flag controlling the ordering of signed zeros.
  • minnum non-deterministically either has the 2008 sNaN semantics or behaves like minimumnum. This behavior is currently an emergent property of the overall spec, but needs to be documented explicitly.

I think that, assuming that these semantics are actually consistently implemented, this is a reasonable outcome. In the following are some thoughts on specific issues.


Signed zero handling: fmin does not specify specific signed zero ordering, while (after the changes that have already been made) minnum does specify ordering, with nsz as the opt-out. Some people have argued that minnum should continue to not have ordered signed zero even without the nsz flag, matching the 2008 semantics.

Overall, I do not agree with that position for two reasons: First, exposing the choice via nsz allows us to provide more options. “I don’t care about sNaN, but care about signed zero” is a pretty reasonable point in the floating point min/max spectrum that seems worth supporting. Second, this makes all the different FP min/max intrinsics (minnum, minimumnum, minimum) consistent, which eases lowering across them.

Naming and auto-upgrade: One of the primary concerns in the discussion is that the semantics of minnum were silently changed, modifying the behavior of existing IR. An option here would be to rename the minnum with the new semantics to fmin (which is somewhat more accurate I think) and auto-upgrade existing minnum to minimumnum + nsz. Overall, I’m somewhat doubtful that this is really worth it, given that these behavior changes have already leaked in released LLVM versions.

Refinement to minimumnum: A core problem with the 2008 sNaN semantics is that they make minnum non-associative. However, the non-determinism based semantics allow refinement of minnum to minimumnum if associativity is necessary to perform an optimization.

Vector reductions: We have llvm.vector.reduce.fmin reductions which are specified in terms of minnum. This is problematic because minnum is now non-associative, which means that the reduction is no longer well-defined without a specified reduction order. A vector like <sNaN, 0, 0> could reduce either to minnum(minnum(sNaN, 0), 0) -> minnum(qNaN, 0) -> 0 or to minnum(sNaN, minnum(0, 0)) -> minnum(sNaN, 0) -> NaN.

I think given that the sNaN behavior is non-deterministic in the first place, this is fine in principle – the reduction order just adds an extra source of non-determinism. However, this does mean that we can’t really vectorize a chain of minnums to vector.reduce.fmin, because it is more non-deterministic. (Unless we can exclude the existence of sNaN of course).

So I think in this area we’d need two things:

  • Similar to the minnum semantics itself, we should explicitly specify that llvm.vector.reduce.fmin where any element is sNaN may non-deterministically either return NaN or treat it as qNaN.
  • We should also introduce llvm.vector.reduce.fminimumnum to guarantee sNaN as qNaN treatment. This variant is suitable for vectorization.

I have not checked what the actual semantics of hardware vector reductions are. If anyone has that information handy, that would be an interesting data point.

Omission of canonicalizing operations: An idea floated above is to change LLVM’s general NaN semantics to now longer permit the omission of canonicalizing operations (e.g. x * 1.0 can’t fold to x). The motivation for this would be to allow the canonicalizations that have to be introduced on some hardware to achieve minimumnum semantics to be optimized away more easily, as the current semantics make it hard to rely on implicit canonicalization via existing FP operations. I think this is a topic that’s worth discussing (I’m not convinced it’s the right trade-off, but I can see the appeal), but I think that it’s largely orthogonal to the decision we need to reach here.

MINNUM vs MINNUM_IEEE ISD opcodes: It’s worth noting that while IR always specified the semantics without 2008 sNaN handling, the backend actually distinguished these via different ISD opcodes, and these opcodes still exist today. I think with the understanding that MINNUM may non-deterministically have 2008 sNaN semantics, we probably don’t need two opcodes anymore.


(Note: These items were edited in later.)

Frontends: In terms of the two frontends that came up in this discussion:

  • Rust would use llvm.minimumnum with the nsz flag, to get predictable “sNaN treated like qNaN” semantics.
  • Clang would use llvm.minnum with the nsz flag, to indicate that it does not care which sNaN semantics it gets. The backend will pick whatever is faster.

Canonical form: LLVM currently canonicalizes x < y ? x : y with nnan and nsz to minnum. It’s not clear whether we should keep that form or canonicalize to minimumnum instead. Both are equivalent under nnan. The argument for minnum would be that it’s the status quo and the base behavior is more liberal. The argument for minimumnum would be that it’s associativity makes it more amenable to further optimization (without having to perform an explicit minnumminimumnum refinement). I think we’ll have to decide this based on which one works out better in practice.

Wording draft: [LangRef] Clarify specification for float min/max operations by nikic · Pull Request #172012 · llvm/llvm-project · GitHub has draft wording for the LangRef changes described above.

2 Likes

To make some progress here, I’ve put up [LangRef] Clarify specification for float min/max operations by nikic · Pull Request #172012 · llvm/llvm-project · GitHub for the wording I would propose.

I hope we can get a consensus on this, and then focus on aligning the implementation with the specification, without having to constantly second-guess what the semantics are going to be.

I think the propose basically ignores the requests from both @Deviloper

and @RalfJung

So from a Rust perspective, that’s just as problematic as the current status quo. We’d like to rely on the guarantee that “If either operand is a NaN, returns the other non-NaN operand”, which used to be documented. This means maxnum(sNaN, x) where x is a number (not NaN) must return x

Will the solution become all front ends changed to emit minimumnum + nsz instead? Doesn’t it mean we abandon minnum in reality? If this is true, why we do need to take the effort to mentain minnum (in LangRef, backend support, auto-upgrade etc.)?

The document leaked but implantation not (refer to no nsz emitted by FE, no zero handling changed in backend). So auto-upgrade looks necessary to me.

In fact, I think defining a new `minnum_ieee` and keeping minnum the old senmatic looks better to me. Targets have 2008 sNaN semantics can generate `minnum_ieee` in the FE. It can guarantee the performance and determinism within the same target at least.

@nikic’s proposal is perfectly fine for Rust – we’ll just switch our lowering to minimumnum + nsz, expressing exactly the semantics we have already documented. It just may take some work until we can actually do that: ensuring the lowering on x86 (and other targets?) doesn’t regress compared to the assembly that rustc emits today, and ensuring that the fminimum_num libcall actually works (I don’t know how widely available that function is at this point, given how recently it was added to C).

Ideally, we can avoid updating minnum implementations to the new semantics until minimumnum is viable, to leave a migration path for frontends. This PR achieves that, albeit too late for LLVM 21.

LLVM 21 includes some optimizations that fold minnum(SNaN, x) to QNaN – so the implementation has also already leaked (as can be seen in Rust). That said, this has been reverted in the main branch so for that branch AFAIK it is indeed the case that the implementation is still effectively the same as before the docs change. But that implementation is messy, behavior differs between x86 and aarch64, and even within x86 depending on whether the libm function is called or not. So it’s not clear that it’s worth the effort of a migration – given the messy state of docs and implementation, we can’t be really sure about the intent of frontends that emitted minnum.

Yes, the use of libcall is a unstable factors to determinism. But library implementation is not in the control of compiler. We can try to avoid to generate it, but the rest is in the users hand.

  • I don’t think consistent helps much in the lowering;
  • They are not consistent in targets have 2008 sNaN semantics;

Floating point is unavoidably non-deterministic at the spec level. Our existing NaN propagation semantics are non-deterministic, and FP libcalls that are not required to be precisely rounded are also non-deterministic (again, at the IR spec level).

As for this specific non-determinism, it is only relevant if sNaN values are involved, so it should not be relevant for practical purposes.

I should have explicitly mentioned this:

  • Rust will use minimumnum + nsz to get predictable sNaN semantics.
  • Clang will use minnum + nsz, because it does not care which sNaN semantics it gets.

I think it’s unavoidably going to regress for most non-X86 targets, because they previously did not follow the specified semantics. But I assume Rust would want to pay that price…

This is a good point. I think for Rust this isn’t a problem as we can just provide fminimum_num in compiler-builtins.

The proposal explains the difference: minnum behaves non-deterministically with regards to sNaN, whereas minimumnum always treats a sNaN operand the same way as a qNaN operand.

As I understand it, those proposed semantics for minnum and maxnum do allow for maximum performance across all targets.

He’s specifically talking about consistency regarding signed-zero handling. I think it’s fine to have minnum/maxnum order signed zeroes by default, since we can still opt out of it with nsz.

1 Like

The only problem with that is implementing it, right? It can’t use fmax as that doesn’t guarantee the zero ordering. So it’ll need some sort of direct lowering for all architectures.

This all sounds good to me. As you said, we already have the first operation, and the second operation is minnum/maxnum in your proposal. I believe the third operation is how minnum/maxnum is described in the language reference now, without your proposed changes.

Vector reductions shouldn’t be too big of a problem if we introduce llvm.vector.reduce.fminimumnum like you mentioned. For autovectorization, we can refine minnum to minimumnum, and implementing llvm.vector.reduce.fminimumnum itself just requires we canonicalize the input vector beforehand.

At least for AArch64, the pseudocode implies it recursively divides the vector in half.

I read @Deviloper post again, so yeah, they expressed the concern for signed-zero consistency specifically. With @RalfJung agreed on the minimumnum + nsz solution, I think consistency is not a problem yet.

I think so. x86’s minps/maxps behaves like a compare+select, so whatever lowering we use for x86 can be used for any architecture that implements floating-point comparisons. I wonder if there’s a way to abstract out most of the x86 backend’s LowerFMINIMUM_FMAXIMUM code; a lot of it is about applying known-value optimizations and skipping steps if we know that some operands will never be NaN or zero.

Yeah I think I am fine with the suggested semantics by niki as a step forward. It seems to pretty closely mirror the status quo (besides the addition of “nsz” to “max”-/”minnum”).

What does concern me a bit, but I don’t consider it a blocker atm, is the way how from a formal standpoint using any “minnum” anywhere renders the entire program indeterministic if there is also a sNaN anywhere. According to the specified rules it is completely fine for sNaN to spread through 10 fourier transforms and affect every variable where it could make “minnum” behave weirdly 3 days later. There is no documented way to stop the spread, is there?

Everyone knows this shouldn’t happen, because all architecture reset the signaling bit with the very first operation all by itself.

Pratically I am not too worried though, but I consider this to be a hole in our current specification. Also we can avoid this intrinsic but I consider it’s non-locality of the non-determinism as unfortunate.

I disagree with this statement. Usually we work under the assumption that all NaN values are equivalent. If you consider all NaN values to be interchangabe and don’t encode information in there, then everything is perfectly deterministic in theory and practice.

If we need something known to be a lib call, we use our own (vectorized) implementation instead.

Note that you have to also avoid looking at the sign of NaNs to be sure.

But yeah, it is very unfortunate that some operations treat SNaN so different from QNaN. minnum is not the only such case though, there’s also pow and powi. It might be worth looking into what the overhead would be of guaranteeing “SNaN inputs produce a NaN result” for these operations, but this would also affect non-NaN codepaths (since some InstCombine folds would have to be removed or restricted), and it is hard to achieve when not even all libc implementations guarantee this… which ultimately boils down to the fact that the C committee decided to not standardize SNaN behavior; it’s very hard for downstreams to compensate for that.

OTOH, the “obvious” reply to your problem is to ensure there’s never an SNaN anywhere. No LLVM operation introduces new SNaN. So they can only come about from explicit actions of your code. If you consider SNaN impossible and all other NaN values interchangeable, even minnum and pow are “deterministic”.

llvm.canonicalize should do it. But I’m not sure if clang exposes that in any way – Rust does not.

|llvm.canonicalize| should do it. But I’m not sure if clang exposes
that in any way – Rust does not.

In clang, you can use __builtin_canonicalize to do canonicalization
(or __builtin_canonicalizef for floats and __builtin_canonicalizel
for long double).

Interesting, it seems to be defined to work with non-strict floating points. Yeah, this would do.

Unfortunately it seems to generate awful code opting to use a multiplication and not being optimized out around any other operations. I doubt we will use this currently if it’s just for theortical safety.

Signed NaN is a valid concern, it is true that “copysign” and common bit manipulation tricks might make it observable. We might need to investigate if this can become a problem for us. I wonder if LLVM could/should guarantee some more things here to make it more reliable. I think this is offtopic for this discussion though.

It may be worth starting a new thread about the “canonicalize” operation; I think Rust wanted to use it at some point but it wasn’t implemented on all backends.

AFAIK, LLVM can optimize out canonicalize ops in some cases. I remember testing minimumnum/maximumnum on AArch64, and it doesn’t need to canonicalize its inputs if they come from other floating-point operations. It seems like “does this value come from a floating-point operation” should be a fairly easy thing to track.

Of the majority of nominally maintained targets, they’re using the libcalls. So maybe not of targets most anyone cares about, but the majority of LLVM backends.

It’s not just AArch64, it’s all of the hardware implementations of these operations.

Practically speaking I don’t think we’re going to do any better than this.

The backend definitely needs deterministically known 2008 sNaN behavior to match the hardware instructions. The legalization of everything else will be built on top of it (relatedly, we probably do need rules against dropping canonicalizes in codegen).

You can use llvm.canonicalize for this.

We have the full set of __builtin_canonicalize, __builtin_canonicalizef, __builtin_canonicalizel and __builtin_elementwise_canonicalize

Yes, that’s the price of LLVM’s policy of permitting dropping canonicalizes. If we didn’t have that, we could equivalently use fadd %x, -0.0 with less optimization impact. We also are in need of improving canonicalize elimination in codegen. We could more definitely benefit from making the SDAG operation rules stricter, which would help.

This is not something IEEE-754 gives you. Implementations are free to flip the sign bit or set payload bits of nan outputs for canonicalizing operations (which is everything except fabs, fneg, and copysign)

@Deviloper was proposing to treat all NaNs as one equivalence class. So their sign and payload doesn’t matter. In that sense, most floating-point operations are 100% deterministic. (Well, they should be. Then x87 comes along, or hardware that doesn’t compute subnormals, or whatever.)

One thing that’s causing me second thoughts on the “should minnum respect signed zero by default” question is that signed zero ordering isn’t mandated not just by IEEE 754-2008, but also by Annex F of the C standard (it’s only an “if possible” footnote), and we want minnum to be lowerable to an fmin libcall. We can’t guarantee that the fmin lowering will follow the stricter signed zero semantics based on just the standard. Is it safe to assume that in practice, all libc’s do implement signed zero ordering?