Modelling strict floating point behavior in math-like dialects

Hello,

Is there a way to model strict floating point behavior in math, arith dialects? I am almost sure there is not, but it is worth asking.

I wonder what would be the right way to model floating point control and status words in MLIR, e.g. with an option like clang’s -ffp-exception-behavior=strict a user may expect that an operation writing floating point control word is not reordered with regards to operations that read it, and that two operations writing floating point status word are not reordered, if there is a following operation that may read the status word.

It seems that generic optimization passes will have to take into account this behavior to preserve the floating point exceptions semantics. I wonder if exploiting SideEffectsInterface for this purpose sounds appropriate, meaning that math, arith, etc. operations will have to support SideEffectsInterface that would report read/write of a global floating point control/status resource, e.g. based on some sort of FastMathFlags attribute or absence of any FastMathFlags, and return no-side-effects otherwise (under -ffp-exception-behavior=ignore).

I guess one of the alternatives to implementing this for math-like dialects is to lower a language’s math operations into generic library calls that would naturally have unknown side effects and will not be optimized in any way, but this sounds like a too big hammer to me. For example, even though a math operation may have effects on the floating point control/status words it may have completely no side effects with regards to memory reads/writes - exposing this behavior to the optimizations will allow producing better code, e.g. getting rid of redundant memory accesses. In addition, having the math operations represented with math-like dialect operations enables math-aware optimizations, such as constant folding and simplification, some of which may still be done in strict mode.

Any comments are appreciated. This is my first message here, so please feel free letting me know if I am missing some basic stuff.

Thank you,
Slava

Side effects sounds like the right mechanism to model this from generic pass perspective. We can indeed introduce a new resource for this purpose, which should effectively prevent undesired reorderings by generic passes. It also sounds better to have this property be represented as an attribute on the op, likely absent by default, because having side effects (which are part of the op’s semantics) depend on some pass flag sounds wrong layering-wise. Whether this attribute is a part of FastMathFlags or a separate unit attribute is debatable.

Transforming operations into opaque uninspectable function calls sounds like the exact opposite of what MLIR usually does, i.e., exposing more of the semantics to the compiler.

Side note: LLVM IR models these as intrinsics - LLVM Language Reference Manual — LLVM 15.0.0git documentation - but I’d rather not duplicate the ops in MLIR.

LLVM does what it does largely for legacy reasons, but as the link Alex cites notes, it is unsound. If you inline an “fadd” into a function using the constrained FP representation, then miscompilations will occur - because the compiler will reorder the fadd into a region with non-default dynamic rounding mode.

In MLIR given a clean slate (i.e. your own dialect) you have a choice of whether to model these as generic side effecting operations and then use SideEffectAnalysis to disambiguate and optimize them, or you can model the flags as explicit dataflow edges in the SSA graph. The former is effectively what LLVM’s bounded intrinsics do, but it doesn’t compose very well with general compiler infra - for example you can’t constant fold them without heroics.

The other way to handle this is to do SSA renaming of the physical flags resource, e.g. just focusing on rounding mode, something like:

// Specific rounding modes mix
%c = yourdialect.fadd %a, %b, #yourdialect.roundToZero
%d = yourdialect.fadd %a, %b, #yourdialect.roundDown

// Use whatever dynamic rounding mode exists (equiv of llvm's fadd instr) at your own peril:
%e = yourdialect.fadd %a, %b,  #yourdialect.currentDynamicRM

The advantage of this approach is that it allows you to do general SSA optimizations, CSE, constant folding, etc all just naturally work. You can then lower to a representation like Alex describes (e.g. analogous to the LLVM intrinsics) that produce a schedule, insert changes of the rounding mode, etc.

FWIW, this is how the LLVM code generator handles physical resources (e.g. fixed physregs) internally. It benefits to having SSA virtual registers for analysis and optimization until sched/regalloc. It would be nice for this to be plumbed all the day down through LLVM so its code generator did this, but I’m not sure the current state of the intrinsics mentioned above.

-Chris

Thank you for the replies, Alex and Chris!

Chris, I agree that making the rounding mode a part of the operation is a good approach for making the optimizations easier (like constant folding, CSE), but I cannot think of a similar approach for FP exceptions support. I think the SideEffectInterface still need to be properly defined for operations that may signal an FP exception so that they are not reordered incorrectly. On the other hand, it looks like even LLVM’s constrained FP intrinsics do not guarantee original number and order of FP exceptions, so I am not sure if it makes sense to try to support more strict behavior in MLIR if it is going to be discarded later in LLVM.

Can you please clarify what you meant by “#yourdialect.roundToZero”? Is this an operation’s explicit operand or something else?

If this is an explicit operand, then can it be optional and applied to existing operations like ones from arith dialect?

I agree with you re: trapping, depending on the semantics you’re going for. If you want an fdiv to potentially trap (e.g. SIGFPU on that instruction) then it has side effects for sure and there is no way around that.

That said, I thought that FP exceptions were “not that”. I thought they were deferred? I’m not at all an expert on this though, how does it work?

I mean an attribute on the operation instead of an SSA value.

I mean it could be, but then all clients of the operations on the arith dialect would have to be aware of it and honor it. This is the right thing IMO given a clean slate design, I don’t know what owners/users of the arith dialect will have to say about that though.

Regarding FP exceptions, I was thinking about an example like _status87, _statusfp, _statusfp2 | Microsoft Docs

For example, hoisting the divide operation before the preceeding _statusfp call would be incorrect. By modelling it such that the FP operations may write the status resource and any generic call operation may read/write it, we would constraint the reordering of the divide and any calls (including _statusfp call). So I am talking more about “FP status” rather than “FP exceptions”.

Unmasking an FP exception in the control word will make it immediate SIGFPE, but maybe exception handling representation in MLIR is not a relevant discussion at this point in time.

again, I’m not familiar with the details here, but are these even correct w.r.t. the IEEE754 standards in the first place?

I don’t think the C standard has anything to say about this. What are you steering towards?

I’m also not super up to date on the intricacies of FP exception modeling. But I was wondering: could there be an attribute on ops where IEEE 754 indicates trap behavior can be enabled?

// Sets the unit trapping attribute
%0 = arith.fdiv trapping %0, %1

I believe the side effect infra is capable of modeling conditional effects (i.e. only if trapping is enabled)?

I quoted the MSVC example just because it was handily available. I mostly care about Fortran, which must support IEEE_GET_FLAG that is a way to read FP exception flags from FP status word. I believe on the C++ side #pragma STDC FENV_ACCESS on is also supposed to enable strict FP behavior so that std::fetestexcept and FP operations placement is preserved.

Stella, yes, this is what I was initially thinking about: having two resources for the control and status words and set up proper read/write effects for FP operations based on an attribute attached to them. Chris pointed out that a part of the control word defining the rounding mode can be handled in a different way, e.g. via explicit rounding mode attributes such as #yourdialect.roundToZero, #yourdialect.roundDown, #yourdialect.currentDynamicRM, etc. This may simplify optimizations for the FP ops given that the rounding mode is explicit in the operation, but I think to guarantee strict FP behavior the same operations still have to manifest that they may be reading the control word so that they are not reordered with regards to operations that may mask/unmask FP exceptions in the control word.

I would prefer modelling both control and status resources rather than use a single resource that is always read/written by an FP op. For example, having control-read and status-write effects on an FP op may enable LICM, whereas it would not be possible with a single read-write resource.