[RFC] Support of non-default floating point environment on RISC-V

Hi all,

I am interested in the support of non-default FP environment on RISC-V. It requires some severe changes to the way the FP instructions are described now, so it is important to collect opinions and concerns on this topic. Although the discussion is about RISC-V, much of the material here is relevant to any target that needs to support a non-default FP environment.

What is wrong with FP support now?

Most floating point instructions can set accrued exception bits in fflags register to signal about some exceptional events, like overflow, invalid operation and so on. Instructions with dynamic rounding mode also depend on the content of the frm register. Now RISC-V FP instructions are specified so that they completely ignore these dependencies.

Such implementation is suitable for default FP environment only (https://llvm.org/docs/LangRef.html#floating-point-environment). When using it in a non-default FP environment, incorrect code may be produced. For example, in the following code:

csrwi frm, a1

fadd.d ft2, ft2, ft3

compiler may change the order of instructions, which results in incorrect behavior. Although fadd.d depends on the value of frm, this fact is not presented in the properties of FP instructions. Similarly, the code:

fadd.d ft2, ft2, ft3

csrrs t0, fcsr, zero

does not allow changing the order of the instructions, as crsrs reads content of fflags, which is set by the first instruction. But the compiler doesn’t know about this dependency.

How to solve this problem

Description of the FP instructions should be modified so that dependencies with fflags and frm would be present in the instruction descriptions. Both these registers are not specified in the instructions, these are implicit dependencies. Usually they are added to properties Uses and Defs of an Instruction.

RISC-V allows static rounding mode, which is taken from instruction bits rather than from frm. It means that any instruction that can depend on rounding mode exists in two variants:

  1. sets fflags, depends on frm (dynamic rounding mode),
  2. sets fflags, does not depend on frm (static rounding mode).

Such a set of instructions precisely represents hardware, but is not suitable for the default FP environment. Changes of fflags are ignored in this mode, so dependencies on fflags creates useless output dependencies that prevent optimal scheduling. As the default FP environment is the most important use case, these variants should also be considered:

  1. changes of fflags is ignored, does not depend on frm (default FP environment).
  2. changes of fflags is ignored, depends on frm.

So, there can be 4 variants of each FP instruction, probably it is too many. Variant 1 must be supported, it is the most general case in sense of restrictions. Variant 3 also is mandatory, as it represents the default FP environment. Variants 2 and 4 may be omitted but some optimization opportunities would be lost.

Lowering of instruction in default FP environment

Instructions like fadd, which are used in default FP environment, may be lowered in a couple of ways:

  • to the instruction that uses static rounding mode RNE, or
  • to the instruction that uses dynamic rounding mode. In this case frm must contain RNE.

The case of static rounding mode has some advantages:

  • It does not require synchronization of frm when FP environment is changed to default,
  • The code that uses only static rounding mode may be safely called from any code that uses different rounding mode,
  • Instructions with static rounding may be moved freely just as any other instructions,
  • It simplifies implementation of things like #pragma STDC FENV_ROUND.

An issue is possible in this case. A code can set a non-default rounding mode by a call to fesetround, the subsequent instructions would be executed with the new rounding mode. As fesetround usually is an external function, the call instruction serves as a barrier, preventing undesired moves. In the case when #pragma STDC FENV_ACCESS is unsupported it is an acceptable solution. If such code is ported to RISC-V it would fail, if instructions would use static rounding.

As a temporary solution the compiler should lower instructions in default FP environment to variants with dynamic rounding mode. It should decrease the risk of failure. When constrained intrinsics will be implemented for RISC-V, the lowering can be changed to use static rounding.

Are there any things that should also be considered? How many instruction variants should be supported (2, 3, 4)?

Any feedback is appreciated.