Is there a way to model strict floating point behavior in math, arith dialects? I am almost sure there is not, but it is worth asking.
I wonder what would be the right way to model floating point control and status words in MLIR, e.g. with an option like clang’s -ffp-exception-behavior=strict a user may expect that an operation writing floating point control word is not reordered with regards to operations that read it, and that two operations writing floating point status word are not reordered, if there is a following operation that may read the status word.
It seems that generic optimization passes will have to take into account this behavior to preserve the floating point exceptions semantics. I wonder if exploiting SideEffectsInterface for this purpose sounds appropriate, meaning that math, arith, etc. operations will have to support SideEffectsInterface that would report read/write of a global floating point control/status resource, e.g. based on some sort of FastMathFlagsattribute or absence of any FastMathFlags, and return no-side-effects otherwise (under -ffp-exception-behavior=ignore).
I guess one of the alternatives to implementing this for math-like dialects is to lower a language’s math operations into generic library calls that would naturally have unknown side effects and will not be optimized in any way, but this sounds like a too big hammer to me. For example, even though a math operation may have effects on the floating point control/status words it may have completely no side effects with regards to memory reads/writes - exposing this behavior to the optimizations will allow producing better code, e.g. getting rid of redundant memory accesses. In addition, having the math operations represented with math-like dialect operations enables math-aware optimizations, such as constant folding and simplification, some of which may still be done in strict mode.
Any comments are appreciated. This is my first message here, so please feel free letting me know if I am missing some basic stuff.
Side effects sounds like the right mechanism to model this from generic pass perspective. We can indeed introduce a new resource for this purpose, which should effectively prevent undesired reorderings by generic passes. It also sounds better to have this property be represented as an attribute on the op, likely absent by default, because having side effects (which are part of the op’s semantics) depend on some pass flag sounds wrong layering-wise. Whether this attribute is a part of FastMathFlags or a separate unit attribute is debatable.
Transforming operations into opaque uninspectable function calls sounds like the exact opposite of what MLIR usually does, i.e., exposing more of the semantics to the compiler.
LLVM does what it does largely for legacy reasons, but as the link Alex cites notes, it is unsound. If you inline an “fadd” into a function using the constrained FP representation, then miscompilations will occur - because the compiler will reorder the fadd into a region with non-default dynamic rounding mode.
In MLIR given a clean slate (i.e. your own dialect) you have a choice of whether to model these as generic side effecting operations and then use SideEffectAnalysis to disambiguate and optimize them, or you can model the flags as explicit dataflow edges in the SSA graph. The former is effectively what LLVM’s bounded intrinsics do, but it doesn’t compose very well with general compiler infra - for example you can’t constant fold them without heroics.
The other way to handle this is to do SSA renaming of the physical flags resource, e.g. just focusing on rounding mode, something like:
// Specific rounding modes mix
%c = yourdialect.fadd %a, %b, #yourdialect.roundToZero
%d = yourdialect.fadd %a, %b, #yourdialect.roundDown
// Use whatever dynamic rounding mode exists (equiv of llvm's fadd instr) at your own peril:
%e = yourdialect.fadd %a, %b, #yourdialect.currentDynamicRM
The advantage of this approach is that it allows you to do general SSA optimizations, CSE, constant folding, etc all just naturally work. You can then lower to a representation like Alex describes (e.g. analogous to the LLVM intrinsics) that produce a schedule, insert changes of the rounding mode, etc.
FWIW, this is how the LLVM code generator handles physical resources (e.g. fixed physregs) internally. It benefits to having SSA virtual registers for analysis and optimization until sched/regalloc. It would be nice for this to be plumbed all the day down through LLVM so its code generator did this, but I’m not sure the current state of the intrinsics mentioned above.
Chris, I agree that making the rounding mode a part of the operation is a good approach for making the optimizations easier (like constant folding, CSE), but I cannot think of a similar approach for FP exceptions support. I think the SideEffectInterface still need to be properly defined for operations that may signal an FP exception so that they are not reordered incorrectly. On the other hand, it looks like even LLVM’s constrained FP intrinsics do not guarantee original number and order of FP exceptions, so I am not sure if it makes sense to try to support more strict behavior in MLIR if it is going to be discarded later in LLVM.
Can you please clarify what you meant by “#yourdialect.roundToZero”? Is this an operation’s explicit operand or something else?
If this is an explicit operand, then can it be optional and applied to existing operations like ones from arith dialect?
I agree with you re: trapping, depending on the semantics you’re going for. If you want an fdiv to potentially trap (e.g. SIGFPU on that instruction) then it has side effects for sure and there is no way around that.
That said, I thought that FP exceptions were “not that”. I thought they were deferred? I’m not at all an expert on this though, how does it work?
I mean an attribute on the operation instead of an SSA value.
I mean it could be, but then all clients of the operations on the arith dialect would have to be aware of it and honor it. This is the right thing IMO given a clean slate design, I don’t know what owners/users of the arith dialect will have to say about that though.
For example, hoisting the divide operation before the preceeding _statusfp call would be incorrect. By modelling it such that the FP operations may write the status resource and any generic call operation may read/write it, we would constraint the reordering of the divide and any calls (including _statusfp call). So I am talking more about “FP status” rather than “FP exceptions”.
Unmasking an FP exception in the control word will make it immediate SIGFPE, but maybe exception handling representation in MLIR is not a relevant discussion at this point in time.
I quoted the MSVC example just because it was handily available. I mostly care about Fortran, which must support IEEE_GET_FLAG that is a way to read FP exception flags from FP status word. I believe on the C++ side #pragma STDC FENV_ACCESS on is also supposed to enable strict FP behavior so that std::fetestexcept and FP operations placement is preserved.
Stella, yes, this is what I was initially thinking about: having two resources for the control and status words and set up proper read/write effects for FP operations based on an attribute attached to them. Chris pointed out that a part of the control word defining the rounding mode can be handled in a different way, e.g. via explicit rounding mode attributes such as #yourdialect.roundToZero, #yourdialect.roundDown, #yourdialect.currentDynamicRM, etc. This may simplify optimizations for the FP ops given that the rounding mode is explicit in the operation, but I think to guarantee strict FP behavior the same operations still have to manifest that they may be reading the control word so that they are not reordered with regards to operations that may mask/unmask FP exceptions in the control word.
I would prefer modelling both control and status resources rather than use a single resource that is always read/written by an FP op. For example, having control-read and status-write effects on an FP op may enable LICM, whereas it would not be possible with a single read-write resource.
Not yet perhaps, but the upcoming C23 (last working draft) integrates with IEC 60559, which is the rebranding  of IEEE754 (see Change History in Annex M):
harmonization with floating point standard IEC 60559:
• integration of binary floating-point technical specification TS 18661-1
• integration of decimal floating-point technical specification TS 18661-2
• integration of decimal floating-point technical specification TS 18661-4a
The paper trail for this TS is pretty spread out, so I don’t have a good summary what exactly changed since C17, but it looks like it’s mostly contained to Annex F.
A quick search in that Annex for status flags yields for example [omitting footnotes & formatting]:
F.8 Floating-point environment
The floating-point environment defined in <fenv.h> includes the IEC 60559 floating-point exception
status flags and rounding-direction control modes. It may also include other floating-point status or
modes that the implementation provides as extensions.
F.8.1 Environment management
IEC 60559 requires that floating-point operations implicitly raise floating-point exception status
flags, and that rounding control modes can be set explicitly to affect result values of floating-point
operations. These changes to the floating-point state are treated as side effects which respect sequence points.
During translation, constant rounding direction modes (7.6.2) are in effect where specified. Else-
where, during translation the IEC 60559 default modes are in effect:
— The rounding direction mode is rounding to nearest.
— The rounding precision mode (if supported) is set so that results are not shortened.
— Trapping or stopping (if supported) is disabled on all floating-point exceptions.
The implementation should produce a diagnostic message for each translation-time floating-point
exception, other than “inexact”; the implementation should then proceed with the translation of
At program startup the dynamic floating-point environment is initialized as prescribed by IEC 60559:
— All floating-point exception status flags are cleared.
— The dynamic rounding direction mode is rounding to nearest.
— The dynamic rounding precision mode (if supported) is set so that results are not shortened.
— Trapping or stopping (if supported) is disabled on all floating-point exceptions.
The international standard ISO/IEC 60559:2020 (with content identical to IEEE 754-2019) has been approved for adoption through ISO/IEC JTC1/SC 25 and published ↩︎