Should rint and nearbyint be always constrained?

Hi everyone,

According to the current design an intrinsic call that depends on current floating point environment (for example, rounding mode) or change it (for example by raising FP exceptions) is represented by constrained intrinsics. The latter have attached metadata that provide info about current FP environment and have attribute IntrInaccessibleMemOnly that helps keeping order of operations that access FP environment. Non-constrained intrinsics are considered as working solely in default FP environment.

This approach has issues when applied to the intrinsics rint and nearbyint. Value returned by either of these intrinsics depends on current rounding mode. If they are considered as operation in default environment, they would round only to nearest. It is by far not the meaning of the standard C functions that these intrinsics represent.

So the unconstrained intrinsics rint and nearbyint seem to have little use. The corresponding C functions are designed to work in non-default FP environment. IEEE-754 counterpart of rint is roundToIntegralExact, it also assumes dependency on current rounding mode. If these intrinsics are used, FP environment most likely is not default.

We have at least two variants how to cope with this issue:

  1. Add attribute IntrInaccessibleMemOnly to non-constrained rint and nearbyint. It would allow correct ordering of the intrinsics with other operations that access FP environment. In this case existing use of these intrinsics in IR would be preserved but lowering should be changed anyway, because corresponding nodes require chain argument.

  2. Change declaration of rint and nearbyint so that they be the same as their constrained versions are now. Availability of additional information in metadata operands facilitate optimization of these intrinsics. These intrinsics do not need separate constrained variants in this case. It would change existing use of these intrinsics in IR.

The variant 2 seems to be better, as it does not introduce intrinsics that are almost duplicates of existing ones and simplifies optimization. Changes in IR should not be a big deal, as additional metadata operands are added at the end of operand list, access to the existing operand should work. Lowering should be changed in any case.

Does such change make sense? Are there any thoughts about implementation of these two intrinsics?

> This approach has issues when applied to the intrinsics rint and
> nearbyint. Value returned by either of these intrinsics depends on
> current rounding mode. If they are considered as operation in
> default environment, they would round only to nearest. It is by far
> not the meaning of the standard C functions that these intrinsics represent.

I'm not sure why this is an issue. Yes, these two intrinsics depend
on the current rounding mode according to the C standard, and yes,
LLVM in default mode assumes that the current rounding mode is the
default rounding mode. But the same holds true for many other
intrinsics and even the arithmetic IR operations like add.

If you want to stop clang from making the default rounding mode
assumption, you need to use the -frounding-math option (or one
of its equivalents), which will cause clang to emit the corresponding
constrained intrinsics instead, for those two as well all other
affected intrinsics.

I don't see why it would make sense to add another special case
just for those two intrinsics ...

Bye,
Ulrich

I’m not sure why this is an issue. Yes, these two intrinsics depend
on the current rounding mode according to the C standard, and yes,
LLVM in default mode assumes that the current rounding mode is the
default rounding mode. But the same holds true for many other
intrinsics and even the arithmetic IR operations like add.

Any other intrinsic, like floor, round etc has meaning at default rounding mode. But use of rint or nearbyint in default FP environment is strange, roundeven can be used instead. We could use more general intrinsics in all cases, as the special case of default environment is not of practical interest.

There is another reason for special handling. Set of intrinsics includes things like x86_sse_cvtss2si. It is unlikely that all of them eventually get constrained counterpart. It looks more natural that such intrinsics are defined as accessing FP environment and can be optimized if the latter is default. These two intrinsics could be a good model for such cases. IIUC, splitting entities into constrained or non-constrained is a temporary solution, ideally they will merge into one entity. We could do it for some intrinsics now.

I agree with Ulrich. The default behavior of LLVM IR is to assume that the roundToNearest is the current rounding mode everywhere. This corresponds to the C standard, which says that the user may only modify the floating point environment if fenv access is enabled. In the latest version of the C standard, pragmas are added which can change the rounding mode for a region, and if these are implemented in clang the constrained versions of all FP operations should be used. However, outside of regions where fenv access is enabled either by pragma or command line option, we are free to assume that the current rounding mode is the default rounding mode.

So, llvm.rint and llvm.nearbyint (the non-constrained versions) can be specifically documented as performing their operation according to roundToNearest and clang can use them in the default case for the corresponding libm functions, and llvm.experimental.constrained.rint and llvm.experimental.constrained.nearbyint can be documented as using the current rounding mode.

The only issue I see is that since we also assume FP operations have no side effects by default there is no difference between llvm.rint and llvm.nearbyint. I wouldn’t have a problem with dropping llvm.rint completely.

As for the target-specific intrinsics, you are correct that we need a plan for that. I have given it some thought, but nothing is currently implemented. My suggestion would be that we should set the strictfp attribute on these intrinsics and provide the rounding mode and exception behavior arguments using an operand bundle. We do still need some way to handle the side effects. My suggestion here is to add some new attribute that means “no side effects” in the absence of the strictfp attribute and something similar to “inaccessibleMemOnly” in the presence of strictfp.

We could make the new attribute less restrictive than inaccessibleMemOnly in that it only really needs to act as a barrier relative to other things that are accessing the fp environment. I believe Ulrich suggested this to me at the last LLVM Developer Meeting.

-Andy

Some clarification after getting feedback from Craig Topper….

It’s probably best to say in the documentation that the llvm.nearbyint and llvm.rint functions “assume the default rounding mode, roundToNearest”. This will allow the optimizer to transform them as if they were rounding to nearest without requiring backends to use an encoding that enforces roundToNearest as the rounding mode for these operations. On modern x86 targets we can encode it either way, but it seems more consistent to continue using the current encoding which tells the processor to use the current rounding mode. For other targets (including cases where x86 is forced to use x87 instructions), it may be much easier to leave this at the discretion of the backend.

Also, we should take care to document the non-constrained forms of these intrinsics in a way that makes clear that we are “assuming” and not requiring that the operation has no side effects. For the constrained version of nearbyint, we will require that the inexact exception is not raised (to be consistent with iEEE 754-2019’s roundToIntegral operations) and for the constrained version of rint we will require that the inexact exception is raised (to be consistent with iEEE 754-2019’s roundToIntegralExact operation), but for the non-constrained forms it should be clear that the backend is free to implement this in the most efficient way possible, without regard to FP exception behavior.

Finally, I see now the problem with documenting these in terms of the IEEE operations, given that IEEE 754-2019 doesn’t describe an operation that uses the current rounding mode without knowing what that is. I see this as a problem of documentation rather than one that presents any difficulty for the implementation.

Here are some suggested wordings for the “Semantics” section of the langref for these functions:

llvm.nearbyint::semantics

This function returns the same value as one of the IEEE 754-2019 roundToIntegral operations using the current rounding mode. The optimizer may assume that actual rounding mode is roundToNearest (IEEE 754: roundTiesToEven), but backends may encode this operation either using that rounding mode explicitly or using the dynamic rounding mode from the floating point environment. The optimizer may assume that the operation has no side effects and raises no FP exceptions, but backends may encode this operation using either instructions that raise exceptions or instructions that do not. The FP exceptions are assumed to be ignored.

llvm.rint (delete, or identical semantics to llvm.nearbyint)

llvm.experimental.constrained.nearbyint::semantics

This function returns the same value as one of the IEEE 754-2019 roundToIntegral operations. If the roundingMode argument is fpround.dynamic, the behavior corresponds to whichever of the roundToIntegral operations matches the dynamic rounding mode when the operation is executed. The optimizer may not assume any rounding mode in this case, and backends must encode the operation in a way that uses the dynamic rounding mode. Otherwise, the rounding mode may be assumed to be that described by the roundingMode argument and backends may either use instructions that encode that rounding mode explicitly or use the current rounding mode from the FP environment.

The optimizer may assume that this operation does not raise the inexact exception when the return value differs from the input value, and if the exceptionBehavior argument is not fpexcept.ignore, the backend must encode this operation using instructions that guarantee that the inexact exception is not raised. If the exceptionBehavior argument is fpexcept.ignore, backends may encode this operation using either instructions that raise exceptions or instructions that do not.

llvm.experimental.constrained.rint::semantics

This function returns the same value as the IEEE 754-2019 roundToIntegralExact operation. If the roundingMode argument is fpround.dynamic, the behavior uses to the dynamic rounding mode when the operation is executed. The optimizer may not assume any rounding mode in this case, and backends must encode the operation in a way that uses the dynamic rounding mode. Otherwise, the rounding mode may be assumed to be that described by the roundingMode argument and backends may either use instructions that encode that rounding mode explicitly or use the current rounding mode from the FP environment.

If the exceptionBehavior argument is not fpexcept.ignore, the optimizer must assume that this operation will raise the inexact exception when the return value differs from the input value and the backend must encode this operation using instructions that guarantee that the inexact exception is raised in that case. If the exceptionBehavior argument is fpexcept.ignore, backends may encode this operation using either instructions that raise exceptions or instructions that do not.

I’d like to also say that these intrinsics can be lowered to the corresponding libm functions, but I’m not sure all libm implementations meet the requirements above.

-Andy

Hi Andy,

Some clarification after getting feedback from Craig Topper….

It’s probably best to say in the documentation that the llvm.nearbyint and llvm.rint functions “assume the default rounding mode, roundToNearest”. This will allow the optimizer to transform them as if they were rounding to nearest without requiring backends to use an encoding that enforces roundToNearest as the rounding mode for these operations. On modern x86 targets we can encode it either way, but it seems more consistent to continue using the current encoding which tells the processor to use the current rounding mode. For other targets (including cases where x86 is forced to use x87 instructions), it may be much easier to leave this at the discretion of the backend.

Also, we should take care to document the non-constrained forms of these intrinsics in a way that makes clear that we are “assuming” and not requiring that the operation has no side effects.

Note that these aspects are shared by most other FP operations and already discussed in the LangRef section <https://llvm.org/docs/LangRef.html#floating-point-environment> which currently reads:

The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.

The benefit of this exception-free assumption is that floating-point operations may be speculated freely without any other fast-math relaxations to the floating-point model.

Code that requires different behavior than this should use the Constrained Floating-Point Intrinsics.

Your explanation of the implications for optimizers and backends seems like a useful addition to this section. As many intrinsics (not just nearbyint/rint) and instructions (fadd, fmul, etc.) behave this way, I think it would be more useful to consolidate all the information into this section and reference it from the relevant “Semantics” sections.

While we’re on it, let me point out the consequences of breaking these assumptions are still fuzzy even with your clarifications. In general, when a compiler “assumes” something that is not actually true, it’s useful to specify what exactly happens when the assumption is actually false, e.g. the result is an undefined value (undef/poison), or a non-deterministic choice is made (e.g. branching on poison, at the moment), or Undefined Behavior happens. In this sense, I wonder what should happen when the assumptions about rounding mode and FP exception state are broken? If it’s going to take broader discussion to agree on an answer, that’s probably out of scope for this thread, but perhaps there’s a clear answer that just wasn’t written down so far?

For the constrained version of nearbyint, we will require that the inexact exception is not raised (to be consistent with iEEE 754-2019’s roundToIntegral operations) and for the constrained version of rint we will require that the inexact exception is raised (to be consistent with iEEE 754-2019’s roundToIntegralExact operation), but for the non-constrained forms it should be clear that the backend is free to implement this in the most efficient way possible, without regard to FP exception behavior.

Finally, I see now the problem with documenting these in terms of the IEEE operations, given that IEEE 754-2019 doesn’t describe an operation that uses the current rounding mode without knowing what that is. I see this as a problem of documentation rather than one that presents any difficulty for the implementation.

I’m not quite sure what you mean by “uses the current rounding without knowing what it is” --are you referring to the wobbly uncertainty caused by optimizations assuming one rounding mode but runtime code possibly using a different dynamic rounding mode? If so, explicitly defining what happens when dynamic and “assumed” rounding mode don’t match (see above) also addresses this problem. Then the operations can be described like this:

If a rounding mode is assumed [RNE for non-constrained intrinsic or roundingMode argument != fpround.dynamic] and the current dynamic rounding mode differs from the assumed rounding mode, [pick one: behavior is undefined / result is poison / …]. Otherwise, X operation is performed with the current dynamic rounding mode [which equals the statically assumed rounding mode if this clause applies].

Best regards,
Hanna

The only issue I see is that since we also assume FP operations have no side effects by default there is no difference between llvm.rint and llvm.nearbyint. I wouldn’t have a problem with dropping llvm.rint completely.

The forthcoming C standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2454.pdf, 7.12.9.8) defines new function, roundeven, which implements IEEE-754 operation roundToIntegralTiesToEven. When corresponding intrinsic will be implemented (I am working on such patch), llvm.rint and llvm.nearbyint will identical to llvm.roundeven in default environment and both can be dropped. We’ll end up with a funny situation, there are constrained intrinsics (experimental!) but not corresponding ‘usual’ intrinsics. This demonstrates that splitting an operation into constrained and non-constrained variants does not work in the case of rint and nearbyint.

Note, EVEX static rounding forces suppress all exceptions. You can’t have static rounding with exceptions.

We’re also talking about making the vector predicated floating point intrinsics that Simon Moll is working on support both strict and non-strict using operand bundles. So you’re right we could probably merge constrained and non-constrained versions of the existing intrinsics.

One concern with replacing llvm.rint and llvm.nearbyint with llvm.roundeven makes it difficult to turn back into a libcall if the backend doesn’t have an instruction for it. You can’t just call the roundeven library function since that wouldn’t exist in older libm implementations. So ideally you would know which function was originally used in the user code and call that.

One concern with replacing llvm.rint and llvm.nearbyint with llvm.roundeven makes it difficult to turn back into a libcall if the backend doesn’t have an instruction for it. You can’t just call the roundeven library function since that wouldn’t exist in older libm implementations. So ideally you would know which function was originally used in the user code and call that.

Yes, you are right. Such optimization at IR level probably does not make sense.

Actually it is hard to rely on default FP environment in many cases. We know that a program starts with default FP state installed. But in other cases we generally cannot assume this. For example, can we assume default FP environment in this case?

float qqq(float x) {
return nearbyint(x);
}

Depending on the answer compiler either generates non-constrained intrinsic or constrained. Result of nearbyint depends on current rounding mode, so this function accesses FP environment - it implicitly reads rounding mode. Shall user use #pragma STDC FENV_ACCESS on here? Actually no.

C standard (n2454):

7.6.1p2
The FENV_ACCESS pragma provides a means to inform the implementation when a program might
access the floating-point environment to test floating-point status flags or run under non-default
floating-point control modes.

7.6p1
A floating-point status flag is a
system variable whose value is set (but never cleared) when a floating-point exception is raised, which
occurs as a side effect of exceptional floating-point arithmetic to provide auxiliary information.

Not every access to FP environment requires #pragma STDC FENV_ACCESS on, only that which reads FP exception status or sets control modes. None occurs in the example above.

So, even if #pragma STDC FENV_ACCESS on is absent we should not assume default FP environment in the case of functions that read control modes, including nearbyint and rint. They cannot assume default rounding mode and must be ordered relative to other instructions that may access FP environment. The scope of non-constrained intrinsics would be only initialization code, which seems to be marginal case.

Even the basic arithmetic instructions fadd, fsub, fmul, fdiv use the rounding mode. And when we constant fold them we assume the default rounding mode.

Even the basic arithmetic instructions fadd, fsub, fmul, fdiv use the rounding mode. And when we constant fold them we assume the default rounding mode.

Strictly speaking it is not always standard conforming from viewpoint of C standard:

F8.4p1

An arithmetic constant expression of floating type, other than one in an initializer for an object that

has static or thread storage duration, is evaluated (as if) during execution; thus, it is affected by any

operative floating-point control modes and raises floating-point exceptions as required by IEC 60559

(provided the state for the FENV_ACCESS pragma is “on”)

This behavior may be modified using pragma:

7.6.2p2

The FENV_ROUND pragma provides a means to specify a constant rounding direction for floating point

operations for standard floating types within a translation unit or compound statement.

7.6.2p3

If no FENV_ROUND pragma is in effect, or the specified constant rounding mode is FE_DYNAMIC,

rounding is according to the mode specified by the dynamic floating-point environment, which is the

dynamic rounding mode that was established either at thread creation or by a call to fesetround,

fesetmode, fesetenv, or feupdateenv.

So sometimes compiler should not do constant folding and sometimes it should make the folding using non-default rounding mode.
But in most practical cases constant folding indeed assumes default rounding mode.

> Actually it is hard to rely on default FP environment in many cases.
> We know that a program starts with default FP state installed. But
> in other cases we generally cannot assume this. For example, can we
> assume default FP environment in this case?
>
> float qqq(float x) {
> return nearbyint(x);
> }

I'm not sure what problem you see here. In default mode, i.e.
when there is no "#pragma STDC FENV_ACCESS on" in effect,
then the compiler can always assume that the default rounding
mode is in effect.

> Depending on the answer compiler either generates non-constrained
> intrinsic or constrained. Result of nearbyint depends on current
> rounding mode, so this function accesses FP environment - it
> implicitly reads rounding mode. Shall user use #pragma STDC > FENV_ACCESS on here? Actually no.

Well, if #pragma STDC FENV_ACCESS on is not in effect, that means
that the user has promised that at this point during execution,
we will *always* have the default FP environment. As you quote:
``
> C standard (n2454):

> 7.6.1p2
> The FENV_ACCESS pragma provides a means to inform the implementation
> when a program might access the floating-point environment to test
> floating-point status flags or run under non-default
> floating-point control modes.

Note the last clause "*run under* non-default floating-point
control modes". Every bit of code that can possibly ever run
while non-default FP modes are in effect *must* be compiled
with #pragma STDC FENV_ACCESS in effect, or else the whole
program has undefined behavior.

> 7.6p1
> A floating-point status flag is a
> system variable whose value is set (but never cleared) when a
> floating-point exception is raised, which
> occurs as a side effect of exceptional floating-point arithmetic to
> provide auxiliary information.

True but irrelevant, since 7.6.1p2 also talks about floating-point
*control modes*, which are collectively defined as:

7.6p1
A floating-point control mode is a system variable whose value may
be set by the user to affect the subsequent behavior of
floating-point arithmetic.

And this includes the rounding-mode controls.
``
> Not every access to FP environment requires #pragma STDC > FENV_ACCESS on, only that which reads FP exception status or sets
> control modes. None occurs in the example above.
>
> So, even if #pragma STDC FENV_ACCESS on is absent we should not
> assume default FP environment in the case of functions that read
> control modes, including nearbyint and rint. They cannot assume
> default rounding mode and must be ordered relative to other
> instructions that may access FP environment. The scope of non-
> constrained intrinsics would be only initialization code, which
> seems to be marginal case.

This all seems to be based on a misreading of the standard.

Bye,
Ulrich

You can turn it into a libcall to nearbyint (or lower to an instruction that does the operation), since you’re in a context where you’re allowed to assume the rounding mode is default.

– Steve

+cfe-dev as the discussion is now biased toward C standard.

I'm not sure what problem you see here. In default mode, i.e.
when there is no "#pragma STDC FENV_ACCESS on" in effect,
then the compiler can always assume that the default rounding
mode is in effect.

Well, if #pragma STDC FENV_ACCESS on is not in effect, that means
that the user has promised that at this point during execution,
we will always have the default FP environment.

This is a strong statement (no pragma == default mode), we need to confirm it with proper references to the standard. If it is true and the code:

float qqq(float x) {

return nearbyint(x);

}

is really equivalent to:

float qqq(float x) {

return roundeven(x);

}

(in absence of 'pragma STD FENV_ACCESS), it is a fact that would be surprise for many user.

+cfe-dev as the discussion is now biased toward C standard.

I'm not sure what problem you see here. In default mode, i.e.
when there is no "#pragma STDC FENV_ACCESS on" in effect,
then the compiler can always assume that the default rounding
mode is in effect.

Well, if #pragma STDC FENV_ACCESS on is not in effect, that means
that the user has promised that at this point during execution,
we will *always* have the default FP environment.

This is a strong statement (no pragma == default mode), we need to confirm it with proper references to the standard. If it is true and the code:

float qqq(float x) {
  return nearbyint(x);
}

is really equivalent to:

float qqq(float x) {
  return roundeven(x);
}

(in absence of 'pragma STD FENV_ACCESS), it is a fact that would be surprise for many user.

Thanks,
—Serge

C standard: _The FENV_ACCESS pragma_

The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes.

"When set appropriately, the implementation may assume the default rounding mode is in effect."

… The default state ("on" or "off") for the pragma is implementation-defined.

“`Off` is allowed to be the default mode.”

In the presence of the default rounding mode, if you cannot access flags, nearbyint and roundeven have identical observable behavior.

– Steve

>> I'm not sure what problem you see here. In default mode, i.e.
>> when there is no "#pragma STDC FENV_ACCESS on" in effect,
>> then the compiler can always assume that the default rounding
>> mode is in effect.
>
> Well, if #pragma STDC FENV_ACCESS on is not in effect, that means
> that the user has promised that at this point during execution,
> we will *always* have the default FP environment.
>
> This is a strong statement (no pragma == default mode), we need to
> confirm it with proper references to the standard.

That statement is made explicitly (multiple times) in the standard.

Most specifically, C11 7.6.1.2 says:

"The FENV_ACCESS pragma provides a means to inform the implementation when a
program might access the floating-point environment to test floating-point status flags or
run under non-default floating-point control modes. 213)"

where the footnote clarifies:

"213) The purpose of the FENV_ACCESS pragma is to allow certain optimizations that could subvert flag
tests and mode changes (e.g., global common subexpression elimination, code motion, and constant
folding). In general, if the state of FENV_ACCESS is ‘‘off’’, the translator can assume that default
modes are in effect and the flags are not tested."

This explicitly says the if FENV_ACCESS is off, the compiler can assume
that default modes (including default rounding modes) are in effect.

Later, C11 7.6.1.2 goes on to say:

"If part of a program tests floating-point status flags, sets floating-point control
modes, or runs under non-default mode settings, but was translated with the state for the
FENV_ACCESS pragma ‘‘off’’, the behavior is undefined."

This reiterates explicitly what I said in my earlier email, that whenever
any code is run that was compiled with FENV_ACCESS off, then at run-time
the default modes (including default rounding mode) must be in effect, or
else the behavior of the whole program is undefined.

The upcoming C2x standard complicates the logic a little bit since it
also introduces a FENV_ROUND pragma which may be used even in the
absence of FENV_ACCESS. Nevertheless, if code is compiled without
either of FENV_ACCESS or FENV_ROUND in effect, the compiler may still
assume default modes.

>If it is true and the code:
>
> float qqq(float x) {
> return nearbyint(x);
> }
>
> is really equivalent to:
>
> float qqq(float x) {
> return roundeven(x);
> }
>
> (in absence of 'pragma STD FENV_ACCESS), it is a fact that would be
> surprise for many user.

In the absence of FENV_ACCESS, the compiler can assume "default" modes.
But what exactly those default modes are is implementation-defined,
so nearbyint *may* be equivalent to roundeven if that's the default,
but it may also be something else.

Bye,
Ulrich

Let’s summarize.

The view point that absence of pragma STDC FENV_ACCESS means default floating point modes is based on two statements in the standard:

  1. Description of pragma STDC FENV_ACCESS (n2454, 7.6.1p2):

The FENV_ACCESS pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes.

  1. Footnote in the same paragraph:

The purpose of the FENV_ACCESS pragma is to allow certain optimizations that could subvert flag tests and mode changes (e.g., global common subexpression elimination, code motion, and constant folding). In general, if the state of FENV_ACCESS is “off”, the translator can assume that the flags are not tested, and that default modes are in effect, except where specified otherwise by an FENV_ROUND pragma.

As for the statement 1, it can be understood differently: to run under non-default floating-point control modes, one must change these modes, and the pragma informs compiler about the change not about using. This interpretation is supported by the following statement in the same paragraph:

If part of a program tests floating-point status flags or establishes non-default floating-point mode settings using any means other than the FENV_ROUND pragmas, but was translated with the state for the FENV_ACCESS pragma “off”, the behavior is undefined.

Hence if a function only queries about current rounding mode (as in the above example with nearbyint), there is no undefined behavior. This is either a conforming code, or an error, which requires diagnostics. In the former case the function nearbyint must behave as prescribed by the standard, - perform rounding according to current rounding mode.

As for the statement 2, yes, it almost clearly votes for implicit default FP modes. “In general” is a bit confusing. May be “in particular” cases compiler cannot make such assumptions? I think this question should be addressed to language lawyers.

There is a concern that such interpretation can break compatibility with existing programs. Even without pragma STD FENV_ACCESS users could use non-default rounding modes, - library functions provided barriers that prevented from undesirable code movement and programs worked as expected. Changing rules may cause negative consequences.