RISC-V multiply-add instructions (FMADD.*, etc), bit-exactness, and correctness of optimizations

dnpetrov · August 1, 2022, 2:59pm

RISC-V specification:

(11.6 Single-Precision Floating-Point Computational Instructions)

FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2. FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1.
…
FMADD.S multiplies the values in rs1 and rs2, adds the value in rs3, and writes the final result to rd. FMADD.S computes (rs1×rs2)+rs3.
FMSUB.S multiplies the values in rs1 and rs2, subtracts the value in rs3, and writes the final result to rd. FMSUB.S computes (rs1×rs2)-rs3.
FNMSUB.S multiplies the values in rs1 and rs2, negates the product, adds the value in rs3, and writes the final result to rd. FNMSUB.S computes -(rs1×rs2)+rs3.
FNMADD.S multiplies the values in rs1 and rs2, negates the product, subtracts the value in rs3, and writes the final result to rd. FNMADD.S computes -(rs1×rs2)-rs3.

(12.4 Double-Precision Floating-Point Computational Instructions)

The double-precision floating-point computational instructions are defined analogously to their single-precision counterparts, but operate on double-precision operands and produce double-precision results.

Hardware implementations of fused multiply-add instructions often don’t round the intermediate multiplication result. So

fmul.d M, A, B
fadd.d X, M, C

is not bit-exact equivalent to

fmadd X, A, B, C

Currently Clang generates multiply-add instructions under -O1, and does it slightly different from GCC (which generates multiply-add instructions under -O2). We are considering an optimization that would generate more multiply-add instructions.

Is there any consensus on the conditions (such as compiler flags) when such optimizations are “correct”?

jyknight · August 1, 2022, 4:50pm

This is specified by the C standard – you can do a conversion within one expression when #pragma FP_CONTRACT ON is in effect (vs OFF). The default value can be set via -ffp-contract command-line argument, and defaults to on. Because the correctness of the transform is dependent upon the operations being within one C expression, the clang frontend creates the FMAs when possible – it cannot be a bitcode optimization.

However, as an additional option which violates the C standard, we also support -ffp-contract=fast. This does allow contraction across statements. In that case, Clang emits the LLVM IR Fast-math-flag contract on floating-point instructions, which tells the optimizer that it may do such an optimization if desired.

Topic		Replies	Views
AArch64 fmul/fadd fusion LLVM Dev List Archives	5	110	September 21, 2015
Semantics of floating-point instructions are unclear IR & Optimizations core , llvm	11	1346	March 12, 2023
Upcoming API change: FAdd, FSub, FMul LLVM Dev List Archives	8	107	June 17, 2009
Reference Manual Clarifications LLVM Dev List Archives	3	73	March 31, 2008
API change: add, sub, and mul no longer do floating-point LLVM Dev List Archives	3	85	May 3, 2010

RISC-V multiply-add instructions (FMADD.*, etc), bit-exactness, and correctness of optimizations

Related topics