This is a micro-RFC for something I’ve been thinking about but am not planning on taking up any time soon concerning strictfp support. I think the current implementation of constrained FP intrinsics is a bit rough and has some broken edges and could use a refresh in the design. It’s unable to deal with any deviation from a small set of standardized operations.
In particular, target intrinsics are just completely unhandled. Today they just pass through as unconstrained operations and can easily be reordered around mode sets. It’s impractical to require introducing a constrained version for all of those. Additionally, we do not have sufficient information encoded by the constrained operations to fold constrained intrinsics to non-constrained instructions if you consider target FP modes.
As such, I think the solution is to move the complexity about the environment tracking from the intrinsics themselves to the call instruction. I envision this as a flag modifier to the call itself, and if present, will introduce new operands to the call instruction (analagous to the memory ordering and syncscopes on atomics). A different color shed could be a new call_strictfp instruction.
We would then drop the metadata arguments from the current set of constrained intrinsics, and could delete all of the ones corresponding to unconstrained intrinsic calls. What would be left with is some intrinsic aliases for the handful of first class IR instructions in order to use them with call strictfp. My main concern would be making sure the verifier enforces using call strictfp for appropriate intrinsic calls. We’ll need a way to mark an intrinsic as possibly needing strictfp handling in tablegen.
In addition to the current FP exception and rounding mode arguments, we need at least one more metadata-like operand to represent other target FP state (e.g. denormal mode) in order to rewrite strictfp functions to non-strict. AMDGPU also has a few other exotic mode bits, such as changing the overflow behavior of f16 operations. My initial thought is an optional integer operand which represents the known result for llvm.get.fpenv at that point (although that would have some overlap with the rounding and exception modes). We may also want an attribute to indicate a call cannot modify fpenv.
Two points here::
a) most intrinsic libraries (sin, cos, tan, atan, ln, exp, pow} ignore the rounding modes, and those
that claim to deliver correctly rounded results only do so in RM = RNE.
b) expect near future ISA to have some of the elementary transcendental “intrinsic” functions as instructions which actually obey the RM and use it to round the result as intended by application.
And a question:
c) why are you moving FP instructions around an FP-mode-set ?? That is like moving a LD/ST across a change in the memory mapping tables…
Because the representation is broken so it just happens. This is the problem that needs to be solved, that’s the point of the post. Plus there are cases where you can reorder around a mode set (e.g. AMDGPU has two separate rounding modes for f32 and f64/f16, you can order a set around the other type)
This seems like something that would benefit from more general/granular side-effect tracking (similar to intrinsics touching specific registers). It can be modelled by an extra “token”/chain param, but that only works as long as you stay within intrinsics. Do not know if there is an ergonomic way to add this to basic instructions.
The point is the set of operations that needs to be handled is larger than a handful of generic operations, and adding a constrained version of every possible floating point operation is unsustainable.
We need that for any kind of strictfp->standard optimization. I think the more important part is knowing which intrinsic calls the verifier needs to enforce should be strictfp calls in strictfp function, which we could accomplish with some kind of fpenv access attribute (which as an added bonus would help us rewrite strictfp functions)
From viewpoint of code generation it doesn’t matter if FP attributes are taken from operands of the function call or kept separately in the call instruction, but in the latter case we have advantages:
No need to define duplicates for most FP intrinsics,
It is easier to add new FP properties (now it would require adding new metadata argument to each of the constrained intrinsic),
User functions may use the same mechanism, which is useful for some optimizations.
Redesign of FP attributes could be used to make description of FP behavior more detailed and flexible. For example, FP control modes now comprise of only rounding mode. Other properties, like denormal handling, cannot be expressed in IR. The new FP attribute representation could represent it and other attributes, including target-specific.
The FP attributes should represent read-write operations on some portion of FE environment, like:
reads exception bits,
writes exception bits,
reads rounding mode,
writes rounding mode,
and so on.
Now any access to FP environment prevents from reordering. Some functions, for example,fesetround and fetestexcept access different properties, so their order may be changed and it may help optimization.
With such model of FP environment we could get rid of “fpexcept.*”. This attribute is not a property of operation, it is not a part of FP environment either. It is more like an intention to treat exception bits in some way. If instructions with different “fpexcept” mix (which happens during inlining) meaning of it become especially vague. Description of the exception bits as a register that can be read and written is clearer and could enable some optimization.