[RFC] FastMath flags support in MLIR (arith dialect)

I don’t quite grasp the difference between “fast-math is allowed” and “fast-math is unspecified”,which you interpret as “it may be turned on”.

This is precisely what clang/LLVM were doing, right? fastmath flags had the value none, and the implementation was using the attribute from “somewhere else” (which in this case was TargetOptions). Having another possibly conflicting source of flags wasn’t good for implementing the back end, but there may be some other uses for the idea.

To be clear, I am not suggesting that should be any ambiguity from a backend/codegen perspective. If an Operation has no fastmath attribute, this is interpreted the same as if it had fastmath = none. Any transformations that would fold/canonicalize/replace would have well-defined behavior.

I think that allowing an unspecified fastmath attribute may have value in manipulating an IR, before the transformations that would use fastmath.

Consider an MLIR module with 10,000 float instructions, and 1,000 that are known to be numerically sensitive. On these, fastmath would be set to none, prohibiting traditional fastmath optimizations. What about the remaining 9,000 floating point instructions? The impact of applying fastmath optimizations to these remaining instructions to overall accuracy is unknown at the time of MLIR generation. If the fastmath attribute were to be unspecified when generating the initial MLIR, then a pass could do something like:

if (fastmath.not_specified()) fastmath.set_fast();

After lowering and compilation, the output is compared to reference for accuracy. In this way, whether or not the attribute was specified provides an additional op filter, similar to finding all ops of a given type. In this case, the optional fastmath attribute provides a way to defer setting the fastmath flags.

I’m not attached to making fastmath an optional attribute - I just thought there there might be a use for it. Maybe there is a way to do the same thing with discardable attributes or something.

Fast math flags are probably useful for the math dialect as well. When the math dialect operation is lowered the fast attribute can potentially be present on the llvm intrinsic that corresponds to the math function, or on the call instruction to the math routine, or in some cases can even be encoded into the mangled name of the math function.

Can this be viewed as a use-case for Fast Math flags/attributes outside the arith dialect?

In Clang there is now a kind of umbrella flag for modelling floating-point behaviour, ffp-model=<strict|precise|fast>. Would it make sense to model this in MLIR?
https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffp-model

Fast math flags are probably useful for the math dialect as well.

You are right! I should have seen that before now - I was focused on a different use case. I think this means that FastMathFlags would have to be a builtin attribute so that it can be used in both arith and math dialects (and maybe others).

In Clang there is now a kind of umbrella flag for modelling floating-point behaviour,
ffp-model=<strict|precise|fast> . Would it make sense to model this in MLIR?

Yes - I think that we could adopt those umbrella flags as well. The BitEnumAttr recently changed to allow enum cases that are effectively aliases for a group of individual bits. Setting one of the “umbrella” values would set the corresponding individual bits (assuming we define all those in the clang table), but we would also retain the ability to use a subset of the individual bits.

Not really: “unknown” and “false” is the same thing LLVM (unless it changed?)
What LLVM has been doing is allowing a global flag to override the per-instruction flag (when it comes to the backend at least). Mostly because it took longer for the backend to support per-instruction flag compared to LLVM.

Then you can mark them as “fast-math is allowed”.
Your flow still seems possible: you can perform lowering and compilation, run, check for accuracy, and then remove fast math flags if it isn’t satisfying.

Basically I think for your distinction to make sense you would need to justify a distinction between “fast-math is allowed” and “fast-math shouldn’t be disallowed”.

Another issue you have is that the user may want to express that “fp-contraction” is allowed but nothing else. However that would fit your definition of “if (fastmath.not_specified()) fastmath.set_fast();”, so it seems that what you want is an orthogonal attribute.
Maybe even a tri-state for each FMF: “allowed”, “disallowed”, “flip_a_coin” (or “unset” as you prefer :wink: ).

These are convenient way for the frontend users to control the individual flags that gets added. It seems that precise is entirely equivalent to the contract fast-math flag isn’t it?

Yes, precise is equivalent to contract=ON in the Clang modelling.
WRT math intrinsic functions, there can be different implementations provided by a vendor for each of the options (fast, precise, strict). Retaining the model information can help in choosing which implementation to pick.

+1. This is a long, long-standing TODO.

Some of the operations in the math dialect could make use of the fast-math flags. math and arith are closely related dialects.

I guess I haven’t understood still at this point the fundamental difference, or rather why this difference is useful. This “I don’t know” behavior is weird to me, and I’m still grasping with it.

To be clear: I understand that part fine. I see that you want to model “I-have-not-yet-determined-the-behavior”, I am just not sure why and how it differs from “fast-math is allowed” in practice.

If I really try to imagine some use-cases for this (like the one you describe), I’m not convinced I would include them in the way FMF is modeled and instead handle them separately with a domain-specific flag.

This is mostly because any analysis or transformation will need to have a binary answer: “allowed”/“disallowed”. The “maybe” state in between has no interpretation here for any FMF client.

It seems then there is something I clearly don’t understand in your motivating use-case, or what you really want here.
You haven’t really elaborated on this with respect to my example: you may know that fp-contract is always safe, so you want to add it everywhere. However you don’t know about all the more “aggressive” options: how do you handle it now? Your use-case seems that you want to leave out the possibility of adding them later. But because you added fp-contract you can’t express this optionality anymore.

I couldn’t make sense of your distinction and optionality so far: that does not mean you’re not right though, so it is worth discussing it more :slight_smile:

I’m enjoying the discussion, and I am happy to continue it. I’m also happy to defer to your expertise here, and I’m not 100% sure that my thoughts are well-formed enough to insist on something…

I see the traditional clang model as one that specifies fastmath flags at compilation time, but I think that MLIR should strive to adopt something more “fluid”. In other words, with C/C++ and clang, if you want to change fastmath flags, you recompile the C/C++ source code. With MLIR, maybe there is a situation in which the fastmath flags are varied and iterated on, without leaving MLIR.

It sounds as if you are suggesting that in order to allow subsequent fast-math optimizations (or potential exploration of them), “fast-math is allowed” should be the default in MLIR. This doesn’t seem right to me, since that is the opposite of the default for everything else. (Maybe I’m misunderstanding the suggestion.) At the very least, it seems like having this be the “best practice” for front ends would generate confusion.

I was thinking, perhaps too narrowly for my use cases, that the workflow for FastMathFlags would work something like this:

  1. If the FMF flags are fixed/known for all ops, then front ends should set them to the desired value. (Perhaps this is the most common case.) Lowering to LLVMIR is a straightforward mapping of MLIR attributes. Otherwise…
  2. For operations with “fixed/known” FMF behavior, set the FMF flags to these known values. This might include setting numerically sensitive computations to none (or whatever combination is known to be acceptable for these operations), and others to fast. The presence of these FMF attributes will be honored and not modified (unless some sort of override is forced).
  3. For non-fixed FMF operations, leave the FMF flags unset in the generated IR (by not providing an attribute). These non-fixed ops might include those that will inherit a “global” default value, as well as those that will be determined via experimentation. Passes can be used to selectively apply FMF flags to all FMF ops that have unset FMF attributes. (There could be other selection criteria as well, but unset attributes is a particularly convenient one.)

This essentially partitions all FP nodes into “fixed FMF” and “variable FMF” groups. The exploration process would gradually move nodes from the “variable” to the “fixed” group. Any node without an attribute would be treated as none for FMF purposes by optimization passes. This really only provides a “single bit” of flexibility - but I would argue that in some cases that is enough.

This assumes a very narrow definition of “FMF client”. Clearly, any optimization that rearranges FP operations, or folds constants needs concrete values for FMF flags, and the proposal addresses that - conservatively. (When the optional attribute is not present, it is assumed that no optimizations are safe, just as if there were a none FMF value present.)
But I would argue that a pass that, for example, modifies FMF flags (as part of some iterative exploration) is also a “client.” Going back to my original example, which modifies FMF flags only if they are “unset” - doesn’t that qualify as an “interpretation”? I can understand an argument that there may be other ways to do it, or that it isn’t how MLIR should approach it, but I don’t see how, using your words, there is “no interpretation here for any FMF client.” It is a very limited (on or off) interpretation, but one with very low storage and computation overhead.

My objection to the “tri-state” approach (which is something that I had considered) was based mostly on practical, software engineering considerations. I agree that this approach handles the general case of per-flag “deferral/unknown,” but it seems to me like it does so at a drastic cost in both size and runtime overhead. (If, for example, I wanted to use MLIR for a JIT compiler I might care very much about this.) Perhaps a custom attribute type or data structures would limit the impact, but I’m not sure it is worth the effort. In contrast, optional attributes exist already in MLIR, and the overhead seems minimal. So if it were true that the optional attribute handles 90% of the use cases, to me it seems like a reasonable approach. (And maybe it doesn’t handle 90%, or maybe there are no practical use cases, but that was the context of my “not what I want” statement.)

That is correct - once the FMF attribute is set to any valid value, the ability to interpret it as “unknown” is lost.

But in my experience, the “90%” case is choosing between fastmath=fast and fastmath=none. Other cases are certainly valid and should be supported. In this scenario, FastMathFlags aren’t “8 separate 1-bit variables” that are optimized independently, rather it is often treated as a single “atomic” value.

I can imagine cases where some sort of dynamic attribute would be better suited to solve the “exploration of independent fastmath flags” scenario. But It seems to me that adding multiple optional boolean attributes to every FP instruction (to handle the general case) will bloat the IR, and that is a bell that would be difficult to un-ring.

I’m primarily interested in having a FastMathFlags attribute attached to FP operations. On top of that, I think the optional aspect adds a potentially useful way to add some flexibility to modifying FastMathFlags, via the MLIR pass infrastructure, for the most common “changing FMF as a single value” use case, with (almost) zero time and space overhead. (It seems like any in-memory space overhead of FastMathFlags will be incurred by every FP operation.) More sophisticated usage scenarios could rely on dynamic attributes - and for workflows that aren’t yet defined, or that may be used infrequently, IMO dynamic attributes would be preferable to adding features to all MLIR FP operations.

It seems that you are anchored to the runtime cost of the feature: it seems negligible to me, or at least clearly not a bloat. We’re talking about doubling the number of bits of a very small bitfield that will be stored uniquely in the Context. Just the pointer used to point to this struct alone will cost more storage space than the FMF themselves :wink:

It seems like any in-memory space overhead of FastMathFlags will be incurred by every FP operation.

Because of how we handle attributes: the cost of storage on a per-op basis is null. Each operation has a pointer to a structured allocated in the context, pooled amongst all the operations.

Regardless, let’s leave the implementation cost for a separate discussion, I’m more interested in the usefulness of the feature.

Sure, but by the same logic the >90% of cases also will never ever need an “optional” semantics.
In reality: it is hard to anticipate because when looking at LLVM we think “clang” and other similar flow while MLIR is applied to radically different use-cases.
Also I suspect a lot of people would want to allow contraction as a default behavior instead, which wouldn’t interact well with the kind of auto-tuner you are alluding to.
Another example I know of is a GPU compiler for which allowing reciprocal was also something always on by default (the HW actually had higher precision with reciprocal).

I see it as a “producer” instead, or as something entirely different from the compiler point of view. It seems like an “out-of-band” thing that does not look like any “regular” client (should I have used “consumer” instead maybe?).

Ultimately for your idea of an auto-tuner to work, you need to have the frontends create IR by making a choice (differentiate between “fast-math isn’t allowed” and “you may allow fast-math”, which is different than “fast-math is allowed” somehow): but if the frontend has to express intent to the auto-tuner, why not make it with another attribute entirely? The ops that are intended to be tuned could be tagged with an attribute enable.fmf.autotune for example. Doing so also allows to set for example the allow contraction flag while still expressing the intent to allow an override (this should also address your runtime performance concern).

I’m saying that the very definition of “fast-math” is to allow the compiler to do fast-math optimization: it isn’t a guarantee about anything else. From this point of view I’m mostly saying that the distinction with the “unknown” state is not clear: you’re also just expressing that you “allow” the compiler/tooling to exploit fast-math.
From this point of view, if you intend to use such an auto-tuner, you could start by marking everything that is tunable as “fast” and then let the auto-tuner disable the “fast” mode on some operation. It seems to me that you get the exact same tuning process and the same search space: unless you also want to be able to limit the search space by expressing to the auto-tuner that it should never disallow fast-math when it has been allowed already.

I was comparing an optional BitEnumAttr to 32 different optional BoolAttrs to implement your tri-state suggestion. (I could be wrong, but I don’t think the overhead is as small in that case.) Certainly, the same could be implemented with a separate bitmap and some extra logic to combine bits from the two. I’m aware of the “unique” storage aspect for attributes - I was under the impression that the overhead per op would be a “pointer per attribute”, with multiple ops pointing to the same storage space when the attribute value is the same. Maybe there is something more complex going on.

And they are paying (approximately?) zero price for it.

This is incorrect - you could leave those values as unset in the source MLIR, and then always combine contract with something else when setting the flags via a pass. Regardless, I agree that the optional approach doesn’t handle all (or even most) potential scenarios.

The example pass both consumes (reads) and produces (writes) an FMF value. I’m not sure I understand this part of the thread. Are you saying that an “undefined” FMF value is “out of band” because it doesn’t make sense to certain parts of the workflow (i.e. the compiler)? If so, then I agree - an optional FMF is an “out of band” convenience that could be used for IR manipulation. But there are other parts of MLIR that don’t make sense to a compiler back end - like unrealized_conversion_cast. (That seems pretty “out of band” to me, but I’m OK with it. :wink: )

IMO, this flags seems a little obscure to add to the IR - if for no reason other than readability in text form. (How many more will there be?)

I feel like the unknown state means just that: unknown. It doesn’t mean anything in and of itself, other than that the generation process (or previous pass) did not assign a value - full stop.

That can be used, by different “clients,” to do any number of things. A pass in an autotuner might use this to allow the compiler/tooling to exploit fast math, as you suggest above. Another pass might do something else to replace that op entirely. It seems like you are projecting a meaning onto it based on just one “client” (lowering to LLVMIR), even though that that client doesn’t see the unset value. (All FMF attributes would be set, or it they aren’t set, interpreted as none.)

At a basic level, I see the optional aspect as a way to filter FP operations. Going back to my motivating example, if I have an MLIR file with a function containing 50 mulf operations, and I want to treat 10 of them differently (from a fastmath perspective), and I know that they are different at generation time. I can imagine a way to generate the .mlir file, and then provide a simple option to mlir-opt that will modify only those instructions (albeit supporting limited combinations), and then proceed through the lowering/compilation process. The runtime/storage/text_readability impact (over non-optional FastMathFlags) is (essentially) zero, and IMO it has a conceptual simplicity to it. It seems like you are advocating for the more general case, but even if the overhead is small, there is no guarantee that it will be used, or that it will be adequate for some currently unknown use case.

I’m not attached to the optional part - I was trying to anticipate what might be useful to avoid having to later introduce a breaking change. I understand your points - maybe it is a case of “agree to disagree.” along with “perfection is the enemy of progress”. :wink:

What do you think about this plan?

1.) add FastMathFlags as a required attribute (not optional) with a default value, and implement fold transformations as appropriate
2.) later add enable.fmf.autotune (or an additional is_fixed bit enum, or something else, after further reflection) as needed to handle modification of FMF attributes within MLIR, after step 1 lands

1.) add FastMathFlags as a required attribute ( not optional) with a default value, and implement fold transformations as appropriate

That seems fine, but we need to consider what flags it’ll include. If it is just what LLVM provides, then it is basically about promoting the existing llvm.fastmath attribute for general uses, otherwise we need to look at it more carefully.
Also by saying it’ll be a required attribute, instead of considering that the absence of the attribute implies that there is no FMF enabled, you mean that you’d prefer the op attribute dictionary to always have an entry for the FMF even when no flag is set? But in this case it’d point to one where all flags are unset? What makes you favor this model compared to the existing one where the absence of the attribute implies no flag set?

As a first pass, I was intending to just use LLVM FastMath values - this would definitely give some underlying structure to things like x * 0 = 0 transformations in MLIR, and would be straightforward to lower to the LLVMIR dialect. (There are also some small changes that need to happen to the FMF attribute that is currently in LLVMIR before being promoted to general use. I can elaborate here, or we can save it for the code review…)

As for “FastMathFlags things that clang/LLVM might do differently if starting over”, I know that the behavior of isinf()/isnan() in the presence of fastmath flags has generated lotsofdiscussion recently. It seems to me, at first glance, that MLIR can avoid some of the ambiguity by adding isinf()/isnan() operations with FastMathFlags attributes at some point. (I know that ComplexToStandard is using cmpf uno right now.) But I don’t see a different MLIR-specific fastmath flag solution to advocate here yet.

Yes.

Existing where? If I’m not mistaken, the existing FMF attribute in the LLVMIR dialect is not optional - it is a DefaultValuedAttr<>.

It seems to me that using “attribute absence” in place of an explicit fastmath=none enum value would be clumsy for code that enumerates different fastmath values. For example:

llvm::SmallVector<FastMathFlags, 3> fmf = { none, contract, fast };
for(auto f : fmf)
{
    // Retrieve Operation to modify...
    op.setFastMathFlags(f);
    // lower, run tests, evaluate results, ...
}

Representing the none case in the loop above as, instead, the absence of an optional attribute would require (IIUC) removing the attribute.

It does seem like the “optional” attribute case would allow a more concise assembly format when the attribute isn’t present (compared to having fastmath="none"), but I don’t see any other advantage. (If I’m missing something let me know.)

I’m not sure what it’d look like, but we should leave this for another discussion specific to this topic :slight_smile:

ODS is a bit misleading here, the attribute may not be present but the C++ API will construct one for you for convenience:
The verifier account for the attribute to not be provided:

static ::mlir::LogicalResult __mlir_ods_local_attr_constraint_LLVMOps5(
    ::mlir::Operation *op, ::mlir::Attribute attr, ::llvm::StringRef attrName) {
  if (attr && !((attr.isa<::mlir::LLVM::FMFAttr>()))) {
    return op->emitOpError("attribute '") << attrName
        << "' failed to satisfy constraint: LLVM fastmath flags";
  }
  return ::mlir::success();
}

And the actual accessor is encapsulating the fact that the attribute may not be present:

::mlir::LLVM::FastmathFlags FAddOp::getFastmathFlags() {
  auto attr = getFastmathFlagsAttr();
    if (!attr)
      return ::mlir::LLVM::FMFAttr::get(::mlir::Builder((*this)->getContext()).getContext(), {}).getFlags();
  return attr.getFlags();
}

So unfortunately DefaultValuedAttr isn’t much different than Optional, it is just convenient “sugar” at the C++ API level but does not allow to assume that the attribute is really present.

This could also be handled in the setter itself. Your “none” would be replaced by FastMathFlags{} and the setter could detect it and delete the attribute entry.

Actually the textual format can always be tweaked to favor any default we’d like (like eliding the none case is trivial if it isn’t optional), but also we generally should optimize for the in-memory representation instead.

I don’t quite see a strong argument one way or another right now for this.

Something remaining to be defined is where to put this attribute if no in the builtin dialect, I don’t quite see an alternative right now.
@River707 ?

Thanks for the explanation - I wasn’t aware how that works…

I figured that is possible, but it seems strange (to me) have an op that has a fastmath=none attribute present (in-memory), and then elide that from the output text format. The reason that seems strange is that upon the following read, the op presumably would then be created without the optional attribute (since there was no text for the attribute present). So the round trip of memory → MLIR file → memory changed the actual internal representation (if not the actual semantics in this case). I’m not sure that I understand the intended MLIR semantics of DefaultValued and Optional (compared to what those mean in C++ I guess), so maybe I need to think about this more.

Why builtin? (Or why would there be no alternative to builtin?) It would seem that an attribute in arith would be usable from other dialects (as others have recently helped me understand), and it seems likely that a dialect that uses FastMathFlags would already have a dependency on arith anyway.

Not necessarily: the parser for the operation can have whatever behavior. You could decide that an op with an integer attribute could elide it when the value is 42 (for example because it would be this value in 99% of the cases…).

Yeah no, that wouldn’t be OK, the printer and parser should be in sync to preserve the round-trip.

If we believe that any dialect that would need to use FastMathFlags can easily take a dependency on arith then that would be fine, is it though?

Here are the in-tree dialects that I would consider “likely” candidates for using FastMathFlags, and whether or not the Arithmetic dialect is already a dependent dialect:

Potential FMF Dialect Currently Arithmetic-dependent?
complex Yes
gpu Yes
linalg Yes
math No
scf Yes
spv No
vector Yes
tosa No

I didn’t include the omp dialect in the list above - I’m not sure if omp.reduction would need FastMathFlags, given the OpenMP spec/requirements.

We’ve discussed using FastMathFlags in the math dialect earlier in this thread.

The SPIR-V specification has its own FastMath flags, so I can’t say whether that dialect should use MLIR flags if they were in the arith (and thus incorporate that dialect as a depedency), or create its own FMF attribute that maps directly to the spec.

So, to recap, the choices would be:
a.) make FastMathFlags a builtin attribute so that it can be used by all dialects, or
b.) place FastMathFlags in the arithmetic dialect, and force dialects that wish to use the attribute to become dependent on arith

To me, option b feels “cleaner”, but I will defer to you and @River707 .

1 Like

Thanks for pulling the data up! I’m fine either way, and I suspect @River707 will strongly prefer landing this in the arith dialect :wink:

(by the way I think that the math dialect already depends on arith because of some folding generating arith.const)

You are right - I see the use of arith::ConstantOp in materializeConstant().

I arrived at the table entries by looking at dependentDialects values in the .td files for all of the dialects (as opposed to the same field in the entries for the passes). I don’t see arith as a listed dependency in the .td files (or in the generated calls to getOrLoadDialect()) for the math dialect - maybe this is a bug? Regardless - I will look at this as part of the FastMath changes.

1 Like

Sorry, have been quite busy with other things and haven’t had adequate time to respond here. My only strong preference is against using the builtin dialect as a landing pad for things used by multiple different dialects. We should really be viewing attributes and types in the same way that we view operations; e.g, if we had a fastmath.foo operation, would we consider landing this in the builtin dialect just because multiple dialects want to use it? IMO we would be discussing which non-builtin dialect makes sense, or otherwise consider spinning up some fastmath dialect. We should get out of the mentality that creating a dialect requires operations, Attributes and Types are equally important abstractions. The builtin dialect, just like any other, has a designated purpose and place in the ecosystem.

– River

1 Like

Hello,

Now that D126305 has been merged, there are cases where FastMath flags are dropped during conversion of Arith operations into LLVM dialect’s intrinsic operations. Can someone please review D136225 that adds FastMath flags support for LLVM dialect’s intrinsic operations? This will allow proper propagation of FastMath flags in ArithToLLVM converter.

Thank you in advance,
Slava