[RFC] ArithFastMathInterface support for arith.select

I would like to continue the discussion about adding ArithFastMathInterface support for arith.select operation here.

The discussion started in [RFC][mlir] Conditional support for fast-math attributes. by vzakhari · Pull Request #125620 · llvm/llvm-project · GitHub, and there are certain arguments against modifying arith.select this way.

The above PR, actually, is not directly related to modifying arith.select, so discussing it there may be confusing.

I would like to reiterate on the use-case that was discussed in LLVM when fast-math flags support was added to select instruction. Maybe it was added before that and the flags propagation did not work as expected, but the case seems to be a good starter for the arith.select discussion.

For this example,

double floatingAbs(double x) {
  return (x < 0) ? -x : x;
}

One may want to represent it as math.absf, but it is not equivalent to the straightforward arith.cmpf/arith.select representation unless there is an assumption that x cannot be a signed zero, i.e. in case x is -0.0 the results are the following:
arith.cmpf/arith.select => -0.0
math.absf => 0.0

So with regards to this case nsz attached to arith.select allows such a transformation, and the absence of this flag disallows it.

I am not sure if the fast-math flags can be deduced from the operands of the arith.select in general, for example all three operations involved in this pattern may be compiled with different setting of the fast-math, e.g.:

bool cmp(double x) { return x < 0; }
double neg(double x) { return -x; }
double floatingAbs(double x) { return cmp(x) ? neg(x) : x; }

All three functions may be compiled with different fast-math options, and I think for the arith.cmpf/arith.select conversion into math.abs to work, after the function inlining, exactly the select has to have nsz attribute.

@andykaylor please correct me if I am wrong. I think this discussion may also be relevant to ClangIR-based Clang, which will need to represent fast-math (presumably with arith dialect) for C/C++.

@kuhar, @benvanik can we please continue arith.select discussion here?

Hi @szakharin,

Sorry for a delayed reply, I’m traveling this week.

Thank you for pointing out the prior art here and coming up with the example. I think I understand how llvm ended up with this design, but I’m not convinced we should mirror it in MLIR’s arith dialect. It may make sense for some front-end dialects from flang/clang and the llvm dialect that mirrors llvm ir, but I’d be much more cautious with arith which is much more general and not tied to neither clang nor llvm. Wouldn’t it be better to introduce the fast math rewrites you care about over the higher-level IR where you have much more freedom with the high-level information (including your own fast math bits) still present?

If that’s not possible, I wonder if one way to unblock you would be to have your compiler attach the interface you need to arith.select so that it’s entirely opt-in and not on the default compilation path. This is something that @ftynse suggested when I described the problem to him a couple of days ago.

To me it doesn’t make sense to have fast math flags on ops things that know anything about floating point semantics and merely return existing floating point values, like select, block argument, and loads. This is very heavy-weight and none of the existing transformation that deal with arith.select/other ops/block arguments knows how to present fast math flags, and dropping an inherent attribute is not fine. If what you need is fast math flag information about each fp value, I’d think that this is something that you’d either want to calculate with a dataflow analysis or encode in the type system (say f16<nnan>).

The latter is similar to the encoding attribute you can attach to tensors, but this is very heavyweight and I think would break almost any code that deals with float types.

For the dataflow analysis approach, you could come up with a dedicated op (say arith.assumef that adds fast math assumptions to the operand and returns a new value) – this way you could inject some higher-level knowledge that fp values coming from things like block arguments or load instructions are constrained.