(To keep things simpler, this post is fully under the assumption of non-constrained FP.)
Let me start with the facts. Please correct me if any of these are incorrect:
- LLVM semantics allow treating sNaN as qNaN and allow omission of canonicalizing operations.
- Prior to these changes LLVM documented that sNaN in minnum is guaranteed to behave the same as qNaN.
- The LLVM middle-end relied on these semantics in optimizations.
- The LLVM back-end did not respect the documented semantics on many targets, including:
- On most targets that have a native instruction with 2008 sNaN semantics, it was used without forced canonicalization.
- If libcall legalization was used (which can happen even on X86 when targeting
-Oz) the underlying libc implementation might use 2008 sNaN semantics.
- libcs implementations may or may not use 2008 sNaN semantics. Recent versions of glibc do, musl doesn’t.
- The C standard (including annex F) does not mandate 2008 sNaN semantics.
- There are some IR producers who want the semantics LLVM previously documented (e.g. Rust).
- There are some IR producers who want to avoid the overhead of sNaN quieting that is required to achieve the previous semantics on targets that support the 2008 sNaN semantics in hardware.
I think the conclusion from these constraints is that:
- We need an operation that matches the previously documented semantics (i.e. sNaN behaves like qNaN). This matches minimumnum semantics modulo the signed zero handling.
- We need an operation that may or may not have the 2008 sNaN semantics. This matches the semantics of fmin in C.
- We should not and can not have an operation that is guaranteed to have the 2008 sNaN semantics.
I think this roughly matches where LangRef is right now, though it does not match the implementation:
- minimumnum matches our old documented semantics, with the nsz flag controlling the ordering of signed zeros.
- minnum non-deterministically either has the 2008 sNaN semantics or behaves like minimumnum. This behavior is currently an emergent property of the overall spec, but needs to be documented explicitly.
I think that, assuming that these semantics are actually consistently implemented, this is a reasonable outcome. In the following are some thoughts on specific issues.
Signed zero handling: fmin does not specify specific signed zero ordering, while (after the changes that have already been made) minnum does specify ordering, with nsz as the opt-out. Some people have argued that minnum should continue to not have ordered signed zero even without the nsz flag, matching the 2008 semantics.
Overall, I do not agree with that position for two reasons: First, exposing the choice via nsz allows us to provide more options. “I don’t care about sNaN, but care about signed zero” is a pretty reasonable point in the floating point min/max spectrum that seems worth supporting. Second, this makes all the different FP min/max intrinsics (minnum, minimumnum, minimum) consistent, which eases lowering across them.
Naming and auto-upgrade: One of the primary concerns in the discussion is that the semantics of minnum were silently changed, modifying the behavior of existing IR. An option here would be to rename the minnum with the new semantics to fmin (which is somewhat more accurate I think) and auto-upgrade existing minnum to minimumnum + nsz. Overall, I’m somewhat doubtful that this is really worth it, given that these behavior changes have already leaked in released LLVM versions.
Refinement to minimumnum: A core problem with the 2008 sNaN semantics is that they make minnum non-associative. However, the non-determinism based semantics allow refinement of minnum to minimumnum if associativity is necessary to perform an optimization.
Vector reductions: We have llvm.vector.reduce.fmin reductions which are specified in terms of minnum. This is problematic because minnum is now non-associative, which means that the reduction is no longer well-defined without a specified reduction order. A vector like <sNaN, 0, 0> could reduce either to minnum(minnum(sNaN, 0), 0) -> minnum(qNaN, 0) -> 0 or to minnum(sNaN, minnum(0, 0)) -> minnum(sNaN, 0) -> NaN.
I think given that the sNaN behavior is non-deterministic in the first place, this is fine in principle – the reduction order just adds an extra source of non-determinism. However, this does mean that we can’t really vectorize a chain of minnums to vector.reduce.fmin, because it is more non-deterministic. (Unless we can exclude the existence of sNaN of course).
So I think in this area we’d need two things:
- Similar to the minnum semantics itself, we should explicitly specify that llvm.vector.reduce.fmin where any element is sNaN may non-deterministically either return NaN or treat it as qNaN.
- We should also introduce llvm.vector.reduce.fminimumnum to guarantee sNaN as qNaN treatment. This variant is suitable for vectorization.
I have not checked what the actual semantics of hardware vector reductions are. If anyone has that information handy, that would be an interesting data point.
Omission of canonicalizing operations: An idea floated above is to change LLVM’s general NaN semantics to now longer permit the omission of canonicalizing operations (e.g. x * 1.0 can’t fold to x). The motivation for this would be to allow the canonicalizations that have to be introduced on some hardware to achieve minimumnum semantics to be optimized away more easily, as the current semantics make it hard to rely on implicit canonicalization via existing FP operations. I think this is a topic that’s worth discussing (I’m not convinced it’s the right trade-off, but I can see the appeal), but I think that it’s largely orthogonal to the decision we need to reach here.
MINNUM vs MINNUM_IEEE ISD opcodes: It’s worth noting that while IR always specified the semantics without 2008 sNaN handling, the backend actually distinguished these via different ISD opcodes, and these opcodes still exist today. I think with the understanding that MINNUM may non-deterministically have 2008 sNaN semantics, we probably don’t need two opcodes anymore.
(Note: These items were edited in later.)
Frontends: In terms of the two frontends that came up in this discussion:
- Rust would use
llvm.minimumnumwith thenszflag, to get predictable “sNaN treated like qNaN” semantics. - Clang would use
llvm.minnumwith thenszflag, to indicate that it does not care which sNaN semantics it gets. The backend will pick whatever is faster.
Canonical form: LLVM currently canonicalizes x < y ? x : y with nnan and nsz to minnum. It’s not clear whether we should keep that form or canonicalize to minimumnum instead. Both are equivalent under nnan. The argument for minnum would be that it’s the status quo and the base behavior is more liberal. The argument for minimumnum would be that it’s associativity makes it more amenable to further optimization (without having to perform an explicit minnum → minimumnum refinement). I think we’ll have to decide this based on which one works out better in practice.
Wording draft: [LangRef] Clarify specification for float min/max operations by nikic · Pull Request #172012 · llvm/llvm-project · GitHub has draft wording for the LangRef changes described above.