Propogation of fpclass assumptions vis a vis fast-math flags

I’ve been putting together a proposal on “fixing” the fast-math flags (at least, the flags that aren’t easily formalizable like nnan/ninf/nsz), so I don’t want to steer the discussion too much in that direction yet.

That’s more-or-less what I want with ‘nnan’ and ‘ninf’ – I want a weak form of these flags that enables semantic-breaking transformations without introducing any other changes to the semantics of instructions on which they are present.

I understand, appreciate, and sympathize with this ask. However, I am extremely cautious about replacing semantics that are well-defined and can be incorporated into a formal semantics tools with those that are handwave-y and vague. Prior experience with handwave-y semantics (e.g., pointer provenance) suggests that the end state is you end up with optimizations that individually make sense but combine together to make a clear error. Using reassoc as your example is I think an unintentional example of the problem–reassoc is one of the existing flags where I think the optimizations we are adding to it end up being surprising to users.

That said, I am not an expert in formal semantics; I would appreciate feedback from those who are (e.g., RalfJung or Nuno Lopes). What I can find on searching for formalizations of FP semantics is that there isn’t much precedent in the academic literature for formalization of fast-math flags in the vein of reassoc (nnan-like flags, because the specify the results the operation may give, is formalizable). The only example I see is one group that used a rewrite-based semantics for fast-math flags, and I’m not sure how reliable rewrite-based semantics are. (Especially because I doubt we can ever enumerate the permissible rewrites well enough).

This is going off in yet another direction, but I have a problem with ‘afn’ in that we don’t say anything about just how close the approximation needs to be. Can I replace ‘sin(x)’ with zero? It’s an approximation. I think we’d all agree that it’s not a sufficiently accurate approximation, but I don’t know that we have any rule for how much error is allowed.

Of the existing flags, afn is the flags I propose should be scrapped in its current semantics entirely and replaced instead with one that generally allows algebraic expression transformations (which would cover most of its existing transformations and provide a better place for optimizations like sin(x)/cos(x) => tan(x)).

Going back to my original case with reciprocal approximation of division, we’re using Newton-Raphson refinement because just inserting the rcpps instruction isn’t close enough. The ‘arcp’ flag allows the use of a reciprocal, but an unwritten (I think) rule tells us that the answer needs refinement.

More broadly, I think there are few issues we’re running into right now:

  • We have no general way of identifying acceptable precision loss. There is “recip-estimates” function attribute (which looks to be designed to be compatible with gcc’s -mrecip flag) which can generate use of approximate-reciprocal instructions, but it suffers from all the problems of function attributes for fast math.
  • The “fast” flag is treated as equivalent to the conjunction of all the fast-math flags. This means any optimization you want to shove into the bucket of “trade FP precision for speed” kind of has to be defined as a conjunction of existing semantics, and there are several transformations which do not happily sit in existing flags.
  • We are out of fast-math bits in llvm::Value. FMF sits in SubclassOptionalData, which has but 7 bits, and all 7 bits are defined.
  • Existing fast-math flags are probably deviating from user expectations in uncomfortable ways.

I’m a bit surprised by this response. With the exception of requiring constant arguments to be respected, I don’t think I’m introducing any change to the rules than currently exist for determining if a transformation is legal or not based on the nnan and ninf flags. The current definition for nnan, for example, says “Allow optimizations to assume the arguments and result are not NaN.” Change that to “Allow optimizations to assume that non-constant arguments and the result are not NaN” and you’ve got what I am suggesting.

I don’t think that’s even subjective. Suppose you have an optimization you would like to perform and you are trying to determine if it is legal. If the optimization would be legal if no operands or results were NaN, and there are no constant NaN operands, then the nnan flag makes the optimization legal. If the optimization is illegal for any other reason, it’s still illegal.

Are you saying something more than this that I’m missing?

I think we just need to accept the fact that fast-math isn’t safe. I would say it’s a footgun, but it’s worse than that. It’s the user giving the compiler a gun with the instructions, “feel free to shoot me in the foot as you deem necessary.”

Ultimately, what I’m trying to work towards is finding a way to make fast-math settings generally useful. The only way I can think of to do that is to have some way to communicate to the user when and where the compiler used fast-math and provide a way for the user to disable it locally. So you’d end up with a process where the user enables fast-math everywhere, then runs their program and sees it fail, then looks at diagnostics to see which optimization broke it, then disables fast-math in that part of their code, and so on. And if we can make it possible to automate that process, fast-math will be useful.

My general opinion is that in most cases fast-math-based transformations are relatively safe, but when they aren’t it can be disastrous, and generally the compiler cannot, even in theory, distinguish between the safe and the disastrous at compile time.

I was interpreting the “if such optimizations aren’t performed” part to mean that the operation has to follow IEEE semantics unless some particular optimization triggers. But any transform on an instruction could be an “optimization”, whether or not it’s obviously profitable on some particular target. So the whole thing only makes sense if “optimizations” refers to some narrower class of transforms that are subjectively chosen.

Trying to change the rules more specifically for instructions that have constant operands vs. non-constant operands doesn’t really work either. Certain parts of an instruction are constants: the opcode, the flags, arugments passed to immarg intrinsic parameters. The operands of an “fadd” aren’t constants in that sense: we can add arbitrary values. And various transforms exist that will transform an fadd with a “constant” argument to one with a non-constant argument. Off the top of my head, constant hoisting and functions merging work like this in practice. I mean, I guess it’s not impossible to redesign IR so fadd can actually have an argument that’s “constant” in the sense of immarg, but it would be an unusual edge case that would require special handling in a lot of places.

Maybe the problem here is my lack of precise terminology. When I said “such optimizations aren’t performed” I meant that if the instruction is there, it has the ordinary semantics with respect to what values it produces (and the fact the it has no side effects). Basically, I’m trying to decouple reasoning about the value that will be produced by an instructions from reasoning about what transformations are legal. Does that make sense?

And what I meant with regard to the constant operands was intended to refer to literal values. Basically, I’m trying to make a statement about constant folding. If you have enough information to perform constant folding, then I think the constant folding should respect the values seen, even if the fast-math flags say that they shouldn’t be there. Whether or not that should be the rule with the existing ‘nnan’ and ‘ninf’ flags is debatable, but that’s what I would want to see with weaker forms of the flags I am suggesting.

Okay, say you have fmul, and say there’s some small number of ways it can be emitted under “nnan”. This is basically equivalent to:

switch (ChooseFMulMode()) {
  case IEEE:
    result = IEEEMul(a,b);
    break;
  case Fast1:
    result = FastMul1(a,b);
    break;
  case Fast2:
    result = FastMul2(a,b);
    break;
}

Okay, so how does ChooseFMulMode() work? Say that it’s fully non-determinstic, so each time you run an fmul, it returns a value from one of the possible modes. So the fmul itself becomes non-determinstic: its value is one of IEEEMul(a,b), FastMul1(a,b), or FastMul2(a,b). And given that flexibility, the compiler will prefer specific variants depending on how the value used. So the value and users are tied together: the compiler will choose the variation that allows the simplest overall code.

Presumably, you don’t want to list out all the possible modes, so you could just say any mode you can construct is legal as long as it’s consistent with IEEE for non-NaN input/output values. Put it together, and this is my “freeze poison” proposal from near the beginning of the thread.

You could consider variations where ChooseFMulMode() is not fully non-deterministic. Say, for example, each translation unit picks a ChooseFMulMode() mode at startup. So every fmul has to do exactly the same computation, but the compiler can choose what that way is. So if your instruction set has a “fast fmul” instruction, you can use it, but all fmuls are consistent with each other. Not sure it really allows the optimizations you want, though.

I think this general framework of “come up with a set of modes, then define how ChooseFMulMode() works” completely encompasses all possible ways to define nnan that actually make sense. If your proposal doesn’t fit into this framework, it’s unlikely we can implement it in any reasonable way.

What you’re saying here is clear, it just isn’t something we can reasonably do. Making the semantics of an instruction depend on how its operands are written in the IR completely breaks the whole idea of LLVM IR, which is that it’s a abstract machine which sequentially executes LLVM IR instructions. (“reassoc” does break this, but that’s why we consider it broken…).

We could define “fmul by literal” to be a different IR instruction from “fmul by variable”, but that’s complicated, and I don’t think it really does what you want.


As an alternative to all of this, I guess you could say “I want a compiler that just lowers mathematical expressions to the right thing”. Something like Herbie (https://herbie.uwplse.org/) . But at that point, you’re outside the realm of what we can reasonably do with LLVM IR.

Not sure how feasible this idea would be in practice but I guess something like “if there is a NaN/Inf constant in the IR/code - strip the nnan/ninf flag from the instruction user” mode could work for some cases described.

switch (ChooseFMulMode()) {
case IEEE:
resultIEEE = IEEEMul(a,b);
break;
case Fast1:
resultFast1 = FastMul1(a,b);
break;
case Fast2:
resultFast2 = FastMul2(a,b);
break;
}
if( resultIEEE == resultFast1 && resultIEEE == resultFAST2 )
{
// optimization is allowed
}

That is basically what I want, but I’m still not happy with the implication that we can do arbitrary constant folding in the presence of NaNs. I’m mostly worried about how this works with fcmp instructions. For the compilation mode I am imagining, I think the front end would omit the ‘nnan’ and ‘ninf’ flags from any fcmp instruction, but I think your definition leaves a path for the comparisons to still be eliminated.

The basic pattern I have in mind is something like this:

z = x * y;
if (isnan(z)) {
  // Do special handling for the NaN case
}

Now suppose through a series of optimizations, we’ve reduced x and y to 1.0 and NaN. The compiler looks at “fmul nnan double 1.0, NaN” and decides that if it folds that to 1.0 it will be able to eliminate the branch for the NaN handling. That’s what I’m trying to avoid.

This could potentially come up, as an example, with complex multiplication (shown here without fast-math because fast-math optimizes away the special handling for NaNs in the intermediate calculations): https://godbolt.org/z/Y3c93aa7h

Complex multiplication is an important example because the NaNs can appear in the intermediate calculation as a result of infinities in the input. So if I want to honor infinities but not NaNs, I still need to keep the NaN comparisons to get the correct handling of the infinities.