RFC: Change of strict FP operation representation in IR

spavloff · March 7, 2025, 4:18am

Background

Floating-point operations are unique in that they have different representations depending on the use case. In the default mode, FP operations are represented by instructions such as fadd or intrinsic functions like llvm.trunc, which appear to be pure functions. It is not true in the general case, as on many cores FP operations may depend on the contents of some hardware register that stores options like rounding mode and some other. Additionally, these operations can change bits in some register to report events that occur during the evaluation. In strict FP mode these interactions are not ignored and are represented as a side effects associated with the operation. Consequently, in this mode FP operations must be represented differently to reflect these side effects.

In the current LLVM implementation, FP operations in strict FP mode are represented by constrained intrinsic function calls, such as llvm.experimental.constrained.fadd or llvm.experimental.constrained.trunc. These functions always have side effects. They also have additional arguments, which are compiler hints.

The problem

While the current solution works well for typical cases, it faces challenges in more complex scenarios (Thought on strictfp support). In particular:

It requires two separate functions for each FP operation even though these functions are processed almost identically,
It is difficult to extend this solution to target intrinsics,
It cannot accommodate other FP options, such as the treatment of denormals,
It is hard to support FP control modes specified directly in instructions, such as static rounding (Static rounding mode in IR).

Proposal

The alternative solution is based on these assumptions:

Some intrinsics functions (floating-point operations) may have side effects depending on whether the containing function has the strictfp attribute. In the functions that use the default FP mode, these intrinsics behave as pure functions, if strict FP mode is required, they may have side effects.
Calls to these intrinsics may have special operand bundles, which may be, in particular, compiler hints, or may specify options for the operation, like rounding mode.

The use of operand bundles to pass compiler hints is not a new idea. There was an attempt to replace constrained intrinsics with operand bundles (⚙ D93455 Constrained fp OpBundles), but this effort was not completed.

Two kinds of FP operand bundles are defined, which generally reflect the arguments of existing constrained functions:

Effective control mode set. Now it only includes the rounding mode. It is the rounding mode used for evaluating the called function result. It may be a mode specified in the control register, if compiler can deduce it. It also can be static rounding mode, stored in the corresponding instruction. This kind of bundle is identified by the tag fp.control. In future it may be extended to include other control modes.
Exception handling. This bundle has exactly the same meaning as the corresponding argument in constrained function call. It is identified by the tag fp.except.

The operand bundles are represented as metadata:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"dyn"), "fp.except"(metadata !"strict") ]

Rounding mode is specified by one of the strings: “dyn”, “rte”, “rtz”, “rtp”, “rtn”, “rmm”. Exception handling can take one of the values: “ignore”, “maytrap” and “strict”.

Operand bundles are optional. Unspecified parameters are assumed to have default values, which depends on the attributes of the enclosing function . For example, the following calls are equivalent in strict FP mode:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"dyn"), "fp.except"(metadata !"strict") ]
call float @llvm.nearbyint.f32(float %x) [ "fp.except"(metadata !"strict")  ]
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"dyn") ]
call float @llvm.nearbyint.f32(float %x)

And the following are identical in default FP mode:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
call float @llvm.nearbyint.f32(float %x) [ "fp.except"(metadata !"ignore")  ]
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte") ]
call float @llvm.nearbyint.f32(float %x)

Advantages

The proposed solution has several advantages:

The same function call, such as call float @llvm.trunc.f32(float %x) may be used for both default and strict modes. No more need of separate functions for strict and default modes.
Any intrinsic function can get support in the strict mode with minimal changes.
Hints can be specified in any mode. In particular it allows using static rounding in default mode.
The set of supported hints and options can be easily expanded.

Implementation

There are the MRs, that try to implement this solution:
Implement operand bundles for floating-point operations by spavloff · Pull Request #109798 · llvm/llvm-project · GitHub, in which the operand bundles are introduced and exist together with the constrained functions.
Reimplement constrained 'trunc' using operand bundles by spavloff · Pull Request #118253 · llvm/llvm-project · GitHub, which is an example how an intrinsic can be modified to use the new solution only.

The implementation plan looks as follows:

Implement operand bundles, the !109798 does this,
Update intrinsics to use the operand bundles. It can be made gradually. After this step intrinsic functions use the new mechanism and only constrained counterparts of instructions, like fadd remain.
Introduce new intrinsics like llvm.fadd and use them in strictfp functions. After this step the transition is complete.

Any feedback is appreciated.

nikic · March 21, 2025, 8:24pm

I like the direction of this proposal. The current approach of having separate constrained intrinsics is fundamentally not scalable, esp. when it comes to target intrinsics.

It would be nice to get some input here from some FP experts. cc @arsenm @jcranmer @andykaylor @efriedma-quic

Some notes from my side:

Are these well-known abbreviations? I’d consider spelling these out more fully (round_to_even, etc.)

You can’t do this for the strict FP case. The whole reason why the operand bundle approach works is that while the baseline intrinsics don’t have memory effects, operand bundles are allowed to add them, so having call @llvm.nearbyint() not have effects and have call @llvm.nearbyint() ["fp.control"(metadata !"dyn")] read the FP environment is coherent.

However, you can’t make call @llvm.nearbyint() by itself have additional effects just by dint of being inside a strictfp function.

Apart from the general IR design constraint, making the bundles optional would also make it hard to determine which operations are actually affected by strictfp and which are not.

spavloff:

And the following are identical in default FP mode:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
call float @llvm.nearbyint.f32(float %x) [ "fp.except"(metadata !"ignore")  ]
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte") ]
call float @llvm.nearbyint.f32(float %x)

Can you please clarify what the semantics of "fp.control"(metadata !"rte") are supposed to be? From earlier in your proposal I thought this is going to mean “static rte rounding mode”, but here (non-strictfp) it seems to mean “dyn rounding mode == rte as a precondition”.

I think static rounding mode and rounding more preconditions are very different things and shouldn’t be conflated in one representation.

Edit: Thinking more on the last point, I guess it’s “static rte rounding mode” in both cases, just that outside strictfp we can lower it to an implementation delying on dyn FP mode, because we know they’re equivalent. This still makes me somewhat uneasy. I think this may add some tricky interaction for inlining between strictfp and non-strictfp functions. Being able to specify a known value for the FP mode also seems generally useful, so I’d like to make sure we don’t close the door on that.

jcranmer · March 21, 2025, 9:09pm

Sorry for the delay in response, I’ve been busy with other things as of
late.

Overall, as I believe I mentioned in my presentation last dev meeting,
I’m broadly in favor of this approach–constrained intrinsics haven’t
exactly proven to be a workable solution, and operand bundles seems a
much more promising solution.

Two kinds of FP operand bundles are defined, which generally reflect
the arguments of existing constrained functions:

Effective control mode set. Now it only includes the rounding
mode. It is the rounding mode used for evaluating the called
function result. It may be a mode specified in the control
register, if compiler can deduce it. It also can be static
rounding mode, stored in the corresponding instruction. This kind
of bundle is identified by the tag fp.control. In future it may be
extended to include other control modes.

Exception handling. This bundle has exactly the same meaning as
the corresponding argument in constrained function call. It is
identified by the tag fp.except.

I’m not going to insist on seeing implementation here, but I’d at least
like to see a sketch of what it would look like to bring some of the
other bits in the FP environment into the operand bundle
model–specifically things like DAZ/FTZ bits or x87’s precision control
bits or some of ARM’s weirder bits. “fp.control” at the moment feels
like it’s a synonym for rounding mode, and given that these bits are
orthogonal to rounding mode, I want to understand how they’d differ.

efriedma-quic · March 21, 2025, 10:16pm

Technically, no, we can’t add effects to calls in strictfp functions. But we could remove effects from calls in non-strictfp functions, which has basically the same practical effect.

andykaylor · March 21, 2025, 10:44pm

I also think that this general approach is good. We’ve had general consensus on this as the correct direction for about five years, and I appreciate that you are stepping up to work towards making it happen.

I have a few general concerns.

We need a plan to ensure that existing functionality (to the extent that strictfp mode currently works) won’t be disrupted as we transition to the new implementation. This probably needs to be implemented as a parallel approach, with the existing implementation left in place until the new solution is able to replace everything that currently works (or mostly works). I know this is very important to @kpneal.

I’d like to see more details about how this will work in practice. Specifically, the current approach was chosen based on the desire to have something that was conservatively correct by default. We used constrained intrinsics precisely because it meant that existing optimizations would ignore them. Do we have a similar guarantee with operand bundles?

The Language Reference says, “Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target specifies a memory attribute), unless they’re overridden with callsite specific attributes.” I don’t see any mention of other side effects. What would prevent a pass that optimizes llvm.nearbyint (for example) from ignoring the operand bundle?

Can you clarify what you intend the “effective control mode” to mean? We’ve talked about this before, and I’m not sure how you currently view it. The use of “effective” here concerns me. As I’ve said before, I think this needs to be more like “assumed control mode” – that is, the compiler can assume that this mode is in effect and can select instructions that only conform to the assumed mode (such as those with explicit static rounding), but the compiler is not required to cause this mode to be in effect. Otherwise, we wouldn’t be able to select instructions that use the dynamic rounding mode without proving that the mode described by the operand bundle is in effect (or setting it).

I strongly agree with what @nikic said about not having the default behavior of the intrinsics change based on the strictfp mode. Depending on function attributes or global flags is very problematic.

I also think we need to keep the current rules for inlining non-strictfp functions into strictfp functions. Otherwise, there’s nothing to prevent code motion of non-strict operations relative to strict operations.

How do you plan to handle standard operations like fadd/fsub/fmul/fdiv, etc.? Will there be new intrinsics to represent those so that operand bundles can be optionally attached?

This is less critical, but I’d like to see a discussion of how we can align the new implementation with possible constrained floating point representation in MLIR. @szakharin raised this issue several years ago ( Modelling strict floating point behavior in math-like dialects - MLIR - LLVM Discussion Forums), and I don’t think it has been resolved on the MLIR side, but as long as we’re redesigning things, this seems like a good time to try to move towards a solution there too.

spavloff · March 25, 2025, 6:15pm

Values “rte”, “rtz”, “rtp”, “rtn” are defined by OpenCL specification (The OpenCL™ C Specification). This naming is used for some cores, with variations. The specification does not define a value for the IEEE-754 mode roundTiesToAway, so it was adopted from RISC-V modes (RISC-V Instruction Set Manual, Volume I: RISC-V User-Level ISA | Five EmbedDev), along with dyn.

Other possible encodings could be derived from, for example:

C library names (https://www.iso-9899.info/n3047.html#6.10.7):

FE_DOWNWARD
FE_TOWARDZERO
FE_TONEAREST
FE_UPWARD
FE_TONEARESTFROMZERO
FE_DYNAMIC

Probably without the prefix FE_ and using lowercase,

LLVM values used in constrained intrinsic calls (LLVM Language Reference Manual — LLVM 21.0.0git documentation).

“round.dynamic”
“round.tonearest”
“round.downward”
“round.upward”
“round.towardzero”
“round.tonearestaway”

Descriptive names help new developers understand the code but increase verbosity, while shorter names enhance readability. Choosing between long and short names for rounding modes will likely influence other control modes as well. For example, compare these:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte", metadata !"daz", metadata !"ftz") ]

and

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"round_to_even", metadata !"denormal_as_zero"), metadata !"flush_to_zero" ]

We just need to choose the more convenient naming.

spavloff · March 26, 2025, 5:17pm

All FP computational operations have side effects, there is no way to turn them off. Thus, the basic form of an FP instruction is the form with the side effects. Only if the function code obeys to certain restrictions, these side effects may be ignored, and the FP operations can behave as pure functions. This means that the side effects of FP operations are determined solely by strictfp function attribute. Operand bundles, as well as metadata arguments of constrained intrinsics, cannot alter it. Even if a constrained function call has an argument fpexcept.ignore, it still has side effects.

Using default values for unspecified bundles has some advantages:

The same IR is valid for both strict and default modes. This simplifies some transformations and analyses. For example, inlining could be made uniformly.
Bitcode size is reduced.
Textual representation becomes more readable, because it does not contain data that are trivially derived from function attributes.

The obvious disadvantage is that operations with and without side effects are visually indistinguishable. However:

It is not clear how useful is this information. Anyway, mixing operations with and without side effect is not possible, and the side effect can be easily deduced from the function attributes.
The side effect can be printed in IR via comments. Such solution has advantages if, in future, the side effects of FP operation are split into control and status bit access.

Nor constrained intrinsics, neither this RFC provide such a possibility. But such feature could be beneficial. For example trunc(x) in general has side effects because it raises an ‘Invalid’ exception if x is signaling NaN. But in trunc(x*y) it never raises exception because x*y cannot produce SNaN. Removing side effects from such calls could facilitate optimizations.

This cannot be achieved with constrained intrinsics, as they always have side effects, but it can be realized with operand bundles.

It requires two simple checks:

Does the containing function have strictfp attribute?
Is the called function an FP operation? This check is trivial if FP operations are marked with a special attribute, like that proposed in [IR] Add FPOperation intrinsic property by spavloff · Pull Request #122313 · llvm/llvm-project · GitHub.

The ability to ignore FP side effects is not a property of a separate instruction. These side effects may only be ignored if no other instructions observe them, so this is a property of entire function. What problems are caused by dependence on function attributes?

spavloff · April 2, 2025, 7:07am

nikic:

spavloff:
And the following are identical in default FP mode:
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte"), "fp.except"(metadata !"ignore") ]
call float @llvm.nearbyint.f32(float %x) [ "fp.except"(metadata !"ignore")  ]
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte") ]
call float @llvm.nearbyint.f32(float %x)
Can you please clarify what the semantics of "fp.control"(metadata !"rte") are supposed to be? From earlier in your proposal I thought this is going to mean “static rte rounding mode”, but here (non-strictfp) it seems to mean “dyn rounding mode == rte as a precondition”.

I think static rounding mode and rounding more preconditions are very different things and shouldn’t be conflated in one representation.

Edit: Thinking more on the last point, I guess it’s “static rte rounding mode” in both cases, just that outside strictfp we can lower it to an implementation delying on dyn FP mode, because we know they’re equivalent. This still makes me somewhat uneasy. I think this may add some tricky interaction for inlining between strictfp and non-strictfp functions. Being able to specify a known value for the FP mode also seems generally useful, so I’d like to make sure we don’t close the door on that.

The rounding mode in all cases is the effective rounding mode, - the mode used for instruction evaluation. In most cases, it does not matter if the rounding mode is stored along with the instruction or is taken from a register.

Currently, only dynamic rounding mode is available. In the future, static rounding should also be supported to implement C23 #pragma STDC FENV_ROUND and to take advantage of static rounding available on some platforms.

This future implementation of static rounding could be based on the assumption that the difference between the two ways of specifying rounding mode can be considered as a low-level detail, which does not need to be represented in IR. The compiler has some freedom in how the rounding is specified. Meaning of "fp.control"(metadata !"rte") is preserved, it represents the rounding mode used in the evaluation. If rounding mode is known (not dynamic), it is static rounding. Execution of this instruction does not depend on its position inside a function. Assuming FP exceptions are ignored, such instruction behaves as a pure function and may be used in the default mode. Thus, the call:

call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rtp") ]

is allowed in default mode and may be used to implement the C code:

{
  #pragma STDC FENV_ROUND FE_UPWARD
  y = nearbyint(x);
}

Such implementation has advantages:

• As mentioned above it allows using non-default rounding mode in the default mode.
• It enables seamless implementation of #pragma STDC FENV_ROUND, which generally does not require strict FP handling.
• Instructions with static rounding mode behave as pure functions (assuming FP exceptions are ignored), it could facilitate optimizations.

If the target does not support static rounding, or if the instruction is not available with the required static rounding, compiler must use dynamic rounding. There must be a pass that inserts llvm.set_rounding and llvm.get_rounding to ensure instructions evaluate with the specified rounding. The function becomes strictfp, and floating-point instructions acquire side effects to maintain proper ordering with llvm.{get,set}_rounding.

It should be emphasized that such pass is required in any case, no matter how FP instructions are implemented. It is because pragma FENV_ROUND only specifies static rounding mode for some FP operations, while other still use the dynamic rounding. For example, the code:

float func(double x, double y) {
  #pragma STDC FENV_ACCESS ON
  #pragma STDC FENV_ROUND FE_DOWNWARDS
  return (sin)(x * y);
}

should be implemented with the instructions:

%mul = call @llvm.fmul.f64(double %x, double %y) [ "fp.control"(metadata !"rtn") ]
%sin = call double @llvm.sin.f64(double %mul) [ "fp.control"(metadata !"dyn") ]
%conv = call float @llvm.fptrunc.f32.f64(double %conv) [ "fp.control"(metadata !"rtn") ]

The compiler should switch between the static and dynamic rounding modes and the points, where such switches occur, must be determined by the compiler.

If there are cases where difference between static and dynamic rounding must be explicitly represented in IR, an additional specifier like !"dynamic=round" could be introduced. Similar specifier for static rounding does not look useful, because the described implementation already tries to use static rounding if the rounding mode is known, and it is not possible to use static rounding in all cases.

These are thoughts about possible implementation of static rounding support. Transition from constrained intrinsics to operand bundles does not depend on it because static rounding is not available right now.

The effective rounding mode refers to the mode used during the instruction evaluation. In other words, it is either static or dynamic rounding mode. The term “effective” was added following the discussion with @kpneal (Reimplement constrained 'trunc' using operand bundles by spavloff · Pull Request #118253 · llvm/llvm-project · GitHub), in which it was realized that the rounding mode argument cannot be strictly classified as static or dynamic rounding in all scenarios.

An important difference with constrained intrinsics is that the rounding mode bundle is not an “assumed” mode. It is compiler responsibility to ensure that the instruction is evaluated with the specified mode. In particular, the compiler may put llvm.set_rounding before it. It provides the compiler with some freedom and simplifies implementation of pragma FENV_ROUND.

spavloff · April 24, 2025, 5:52am

Bundle operands to specify denormal modes by spavloff · Pull Request #136501 · llvm/llvm-project · GitHub demonstrates how operand bundles can represent denormal behavior.

I hope Minimal support of floating-point operand bundles by spavloff · Pull Request #135658 · llvm/llvm-project · GitHub can serve as a foundation for the parallel approach you mentioned. It introduces FP operand bundles without modifying the constrained intrinsics. The two mechanisms can coexist and evolve independently. An operation may be represented in either of them or both.

Both approaches use the same side-effect representation to describe interaction with FP environment, ensuring they do not break each other functionality. In fact, constrained intrinsics themselves may be declared as FP Operations and thus may have FP bundles, Add constrained fadd to FP operations by spavloff · Pull Request #136499 · llvm/llvm-project · GitHub is an example of such case. It also adds a code that treats metadata argument as if they were bundles, thus allowing realization of algorithms, which would correctly work with any representation.

An operation could be added to FP operations, its support be adapted to operand bundles and then it can be removed from constrained intrinsics, this could be considered as a step in the new mechanism implementation.

Based on the discussion with @kpneal I understand that ensuring expected behavior of strictfp function attribute is a key requirement. However, this solution does not use this attribute at all. The existing mechanism that creates and propagates the attribute is not changed. Anyway, any feedback on this topic is highly appreciated.

The operand bundles implement a different approach. The same intrinsic can now be used in both default and strictfp environment, so optimizations are enabled even in strictfp environment. When an intrinsic is added to FP Operations, we must verify whether optimizations remain valid in strictfp environment. For most operations the optimizations rely on mathematical properties of these intrinsics and typically hold in strictfp case. Instructions, like fadd, are more complex case, but the intrinsics to represent these operations (like llvm.fadd) initially will also be ignored by optimization.

Operand bundles implementation should be fixed accordingly. The implementation in https://github.com/llvm/llvm-project/pull/109798 indeed missed a case, in Minimal support of floating-point operand bundles by spavloff · Pull Request #135658 · llvm/llvm-project · GitHub it is fixed.

This solution as well as the existing mechanism use only “read/write access to a memory that is not accessible by pointers”. In future this access could be refined to distinguish access to control and status bits. This possibility is promising but now is not considered.

Passes must respect the memory effects reported by getMemoryEffects for FP operations. Other information provided by the bundles is floating-point specific, it is needed only by transformations that are aware of floating-point numbers. Usually it may be safely ignored by other transformations. There are exceptions (like dead code elimination), but they are pertinent to the both mechanisms and necessary code should be already present.

Under the proposed default behavior based on strictfp function property it is impossible to mix non-strictfp operation with strictfp code. When an FP call is moved from non-strictfp code to strictfp function, it automatically acquires side effects and behaves according to the strictfp rules.

nikic · May 10, 2025, 4:46pm

spavloff:

Descriptive names help new developers understand the code but increase verbosity, while shorter names enhance readability. Choosing between long and short names for rounding modes will likely influence other control modes as well. For example, compare these:
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte", metadata !"daz", metadata !"ftz") ]
and
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"round_to_even", metadata !"denormal_as_zero"), metadata !"flush_to_zero" ]
We just need to choose the more convenient naming.

For what it’s worth, I prefer the second variant in terms of verbosity (without commenting on the specific strings used here).

Especially as there already is a lot of syntactic overhead anyway, I don’t think saving a handful of characters here at the expense of clarity is worthwhile.

I don’t really agree with this view, at least in terms of how we want to model these operations in LLVM IR. Note that there are multiple effects involved here. For most FP operations, the most general set of effects will be something like memory(fp_rounding_mode: read, fp_denormal_mode: read, ..., fp_status: write).

Let’s just look at fp_rounding_mode: read to start. @fadd() ["fp.control"(metadata !"round_to_nearest")] does not have a fp_rounding_mode: read effect, because the (effective) rounding mode is fixed. Importantly, this is true regardless of whether the operation is inside a strictfp function or not.

If we have ["fp.control"(metadata !"dynamic")] instead, then the situation is a bit more nuanced. Taken by itself, this operation has a fp_rounding_mode: read effect. However, inside a non-strictfp function, we know that the rounding mode is fixed to round-to-even, so the effect is unobservable and can be modeled as fp_rounding_mode: none.

So when it comes to the FP control effects, the operation has an intrinsic effect determined by operand bundles, and then on top of that (the absence of) strictfp may allow us to ignore some effects.

For fp_status: write the situation is more along the lines of what you said, in that the operation always (potentially) writes the FP status, but fpexcept.ignore allows us to ignore this effect. But regardless of what the underlying hardware behavior is, the way we’d be modelling this at the IR level is still fp_status: none.

All this is to say that I don’t think that that the strictfp should be implicitly changing the effects or behavior of FP operations. The meaning of an undecorated operation inside a strictfp function should remain round-to-nearest + fpexcept.ignore, same as in a non-strictfp function. (Though I would accept the argument to make the decorations required inside strictfp functions to make bugs less likely.)

Regarding the discussion on static rounding modes, I generally agree with what you’re saying, but the part I’m still missing is how the current functionality of constrained FP intrinsics can be represented under the new model.

As I understand it, your proposal allows specifying either a dynamic rounding mode or a static rounding mode, but it has no way to encode an assumed rounding mode.

If you specify a static rounding mode and the target does not allow per-instruction rounding modes, then LLVM will be forced to change the dynamic rounding mode around the instruction to lower it, which would be very inefficient – especially if there is prior knowledge that the dynamic rounding mode is already set to the correct value.

Is this just not a use-case anyone cares about? Otherwise we should have a replacement for it. (A possible replacement would be to make the strictfp attribute more fine grained, so we can express things like “the rounding mode inside this functions is known to be xyz”. If function-granularity is sufficient for this.)

A question related to this one: Why do we use a single fp.control operand bundle rather than separate bundles for different FP control options? The PR does something like "fp.control"(metadata !"denorm.in=ieee"), which we could also spell "fp.denorm.in"(metadata !"ieee").

I don’t think this makes much difference either way, just seemed like an odd choice.

spavloff · May 14, 2025, 5:19pm

nikic:

spavloff:
Descriptive names help new developers understand the code but increase verbosity, while shorter names enhance readability. Choosing between long and short names for rounding modes will likely influence other control modes as well. For example, compare these:
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"rte", metadata !"daz", metadata !"ftz") ]
and
call float @llvm.nearbyint.f32(float %x) [ "fp.control"(metadata !"round_to_even", metadata !"denormal_as_zero"), metadata !"flush_to_zero" ]
We just need to choose the more convenient naming.
For what it’s worth, I prefer the second variant in terms of verbosity (without commenting on the specific strings used here).

Especially as there already is a lot of syntactic overhead anyway, I don’t think saving a handful of characters here at the expense of clarity is worthwhile.

No problem. We can use the same encoding as we do now for constrained intrinsics, it is familiar to developers. Taking into account the note below about splitting “fp.control” and making similar changes in denorm bundles, the syntax could be:

call float @llvm.nearbyint.f32(float %x) [ "fp.round"(metadata !"tonearest"), "fp.denorm.in"(metadata !"preserve-sign"), "fp.denorm.out"(metadata !"positive_zero"), "fp.except"(metadata !"ignore") ]

nikic:

spavloff:

All FP computational operations have side effects, there is no way to turn them off. Thus, the basic form of an FP instruction is the form with the side effects. Only if the function code obeys to certain restrictions, these side effects may be ignored, and the FP operations can behave as pure functions. This means that the side effects of FP operations are determined solely by strictfp function attribute. Operand bundles, as well as metadata arguments of constrained intrinsics, cannot alter it. Even if a constrained function call has an argument fpexcept.ignore, it still has side effects.

I don’t really agree with this view, at least in terms of how we want to model these operations in LLVM IR. Note that there are multiple effects involved here. For most FP operations, the most general set of effects will be something like memory(fp_rounding_mode: read, fp_denormal_mode: read, ..., fp_status: write).

Let’s just look at fp_rounding_mode: read to start. @fadd() ["fp.control"(metadata !"round_to_nearest")] does not have a fp_rounding_mode: read effect, because the (effective) rounding mode is fixed. Importantly, this is true regardless of whether the operation is inside a strictfp function or not.

It depends on how we want to model these operations in LLVM IR. if @fadd() ["fp.round"(metadata !"round_to_nearest")] does not have a side effect, it means that rounding mode in this case is considered as static. This representation is slightly more abstract than the existing constrained intrinsics. But it has advantages including:

more possibilities for optimizations,
natural implementation of pragma STDC FENV_ROUND,
possibility to use static rounding in default mode.

It however requires a dedicated pass that would insert necessary set_/get_rounding calls to preserve the semantics on targets without static rounding.

Using static rounding for specific (non-dynamic) modes seems like a good approach. It allows us to use static rounding freely in LLVM IR for any target, but eventually we should replace static rounding with dynamic before the instruction selection.

The key point is that exceptions cannot be ignored in strictfp environment, and fpexcept.ignore cannot change that. For example:

inline float divide(float x, float y) {
  // default fp mode.
  return x / y;
}

float calculate() {
  #pragma STDC FENV_ACCESS ON
  …
  y = sin(x);
  exc = fetestexcept(MASK))
  …
  c = divide(a, b);
  …
}

In this case, the divide function produces a division instruction with fpexcept.ignore flag, since it originates from a default-mode function. If this instruction had fp_status: none attribute, nothing would prevent the compiler from placing the instruction between the instruction that use exceptions:

  y = sin(x);
  c = divide(a, b);
  exc = fetestexcept(MASK))

In a strictfp function, an IR instruction fadd with fp_status: none cannot be realized with just a single target fadd instruction - it requires at least a pair of fegetenv and fesetenv. This makes the use of fp_status: none in strictfp functions a source of misunderstanding. Most transformation based on this property in LLVM seem to be incorrect. The only valid use is eliminating instructions with unused results, even if the instruction may raise exceptions. For example, the following call may be removed by compiler, although it raises an exception:

    %unused = call double @llvm.experimental.constrained.fdiv.f64(double 0.0, double 0.0, metadata !"round.upward", metadata !"fpexcept.ignore"

Some users think this behavior is incorrect. If they are right, fp_status: none in strictfp function is useless, instructions with and without this attribute behave identically. Consequently the entire parameter “fpexcept” can be removed, as the exception handling would be determined only by the function attribute ‘strictfp’.

If we choose to maintain per-instruction exception handling flags, we must define its semantics more precisely and establish a set of potential use cases for this flags. I would drop it completely, - exception handling is defined by the entire code, it is not a choice of an individual function. It however could be made later, no need to tie this decision with this proposal.

The current implementation supports only two models: default mode and strictfp mode with dynamic rounding. It is possible to create an IR with instructions that use specific rounding modes, but clang does not produce such code and it is unlikely that some MLIR or other LLVM-based code generator creates strictfp code that uses non-dynamic rounding.

So the current functionality of constrained FP intrinsics consists of only dynamic rounding. All constrained intrinsics have side effect to prevent undesirable reorder and this is vital for strictfp implementation. All these properties are preserved in the proposed implementation.

I would propose to consider difference between static and assumed modes as a low-level implementation detail and leave the choice between them to the compiler. fadd executes identically (produces the same results) no matter if rounding mode is specified in the instruction or is read from a register.

If a case arises where it is really necessary to distinguish between them, rounding mode bundle could be amended with an appropriate flag, for example:

call float @llvm.nearbyint.f32(float %x) [ "fp.round"(metadata !"tonearest,dynamic" ]

Yes, you are right. Support of static rounding mode requires some IR transformation that would insert set_rounding and get_rounding to emulate semantics of static rounding. However it is required anyway if we want to implement pragma FENV_ROUND, this is not a drawback of this proposal. The C standard explicitly points out that static rounding may be implemented in this way: https://www.iso-9899.info/n3047.html#7.6.2p5. And yes, a target with hardware support of static rounding would have benefits.

Most of the information needed for the transformation, that converts static rounding to dynamic, can be obtained from analyzing the function code. If pragma FENV_ROUND is used without FENV_ACCESS, we know that no user function modifies rounding mode, and we can insert functions that change rounding mode optimally. Otherwise the user intends to mix static and dynamic modes and the things are more complicated. Compiler intrinsics and standard C functions have known behavior and the pass can use this information. The main challenge lies with user functions, they can change rounding mode and should be treated as barriers making rounding mode unknown. It this is not sufficient, a special function attribute indicating that the function changes rounding mode, can be introduced.

nikic · May 22, 2025, 12:57pm

Yeah, the general approach here sounds good to me.

I think I generally agree with your notion that the fp_status effects should be determined just by strictfp at the function level, not the individual operations.

For context, here is how LangRef describes the non-strict options:

If this argument is “fpexcept.ignore” optimization passes may assume that the exception status flags will not be read and that floating-point exceptions will be masked. This allows transformations to be performed that may change the exception semantics of the original code. For example, FP operations may be speculatively executed in this case whereas they must not be for either of the other possible values of this argument.

If the exception behavior argument is “fpexcept.maytrap” optimization passes must avoid transformations that may raise exceptions that would not have been raised by the original code (such as speculatively executing FP operations), but passes are not required to preserve all exceptions that are implied by the original code. For example, exceptions may be potentially hidden by constant folding.

The wording is a bit fuzzy, but I think the most coherent interpretation I see is that fpexpect.maytrap allows you to remove exceptions and fpexcept.ignore both add and remove – but they still cannot be moved past operations that either a) write the FP exception mode or b) read the FP status.

So the memory effects always exist in a strictfp context, but they can be ignored in certain cases. E.g. it would be legal to reorder multiple consecutive fpexcept.ignore instructions. It would also be valid to treat the fpexcept.ignore operations as willreturn.

In that sense, I do see some value in the instruction-level annotations, even if they don’t affect the generic memory effect modelling.

One unfortunate thing about this model is that the use of strictfp is required to use a dynamic FP environment, but it would also force stricter FP exception/status handling even if fpexcept.ignore is used. These should really be orthogonal things…

Okay, thanks. If we don’t have active users for the assumed rounding mode right now, then not supporting it is fine.

I don’t really agree that it’s an implementation detail. There is a big difference both in terms of semantics (one is “perform operation in static mode” and the other is “if the dynamic mode is not X, the behavior is undefined”) and lowering (one requires changing dynamic rounding mode, the other doesn’t).

spavloff:

If a case arises where it is really necessary to distinguish between them, rounding mode bundle could be amended with an appropriate flag, for example:
call float @llvm.nearbyint.f32(float %x) [ "fp.round"(metadata !"tonearest,dynamic" ]

Something along those lines makes sense to me, if it becomes necessary.

Topic		Replies	Views
clarification needed for the constrained fp implementation. LLVM Dev List Archives	0	133	November 27, 2017
Floating point operations with specific rounding and exception properties LLVM Dev List Archives	5	158	August 21, 2019
Thought on strictfp support IR & Optimizations	8	580	June 27, 2023
Static rounding mode in IR IR & Optimizations	22	356	January 13, 2025
[RFC] Improving IR fast-math semantics IR & Optimizations core , rfc , llvm , llvm-ir	28	1103	March 26, 2025

RFC: Change of strict FP operation representation in IR

Related topics