[RFC] Making `ffast-math` option unrelated to `ffp-contract` option

Summary

Currently when the option -ffast-math is set, it alters the option -ffp-contract in this way:

-ffast-math, Ofast => -ffp-contract=fast
-fno-fast-math => if -ffp-contract= fast then -ffp-contract=on else -ffp-contract is unchanged

Connecting these two options have triggered some spurious warnings; see issue clang warning passing "-ffast-math -fno-fast-math" · Issue #54625 · llvm/llvm-project · GitHub.

The source of the problem is that the -ffast-math option is an on/off flag, but the ffp-contract is an on/off/fast flag. So when fno-fast-math is used there is no obvious value for ffp-contract.

Note that gcc doesn’t connect the two options.

Implementation status

There is a WIP patch open here: :gear: D123630 [WIP] Remove connection between ‘ffast-math’ and ‘ffp-contract’. (llvm.org)

Impact

The changes we are proposing will impact a few CodeGen and Driver LIT tests, listed in the patch above.

There will be performance regression for users who are expecting ffast-math to enable ffp-contract but because ffp-contract is on by default, the impact would be minimal.

The clang changes we are proposing were exercised on the llvm-test-suite on the SingleSource benchmark. Originally this benchmark runs with the option fpp-contract=off and FMA disabled. I removed these flags and replace them with the ffast-math flag. Because of the nature of the ffast-math option, I have disabled the FP checks in the suite, to avoid the expected accuracy fails.
See the changes made for SingleSource in :gear: D124095 [WIP] Experimental changes (llvm.org) . With these changes, the SingleSource is passing using the flag ffast-math.

Were you able to get performance comparisons with the SingleSource benchmarks after your changes to the tests?

I would expect the difference between fp-contract=on and fp-contract=fast to be small and may not show up at all if the benchmark is specifically trying to trigger FMA. The most significant difference is that with fp-contract=on, the llvm.fmuladd() intrinsic will be generated if the mul and add appear in the same source expression, but not if they appear in different expressions. With fp-contract=fast, the intrinsic is never used, so there might be some cases where additional optimizations are performed across multiple operations, but in theory that can interfere with FMA formation.

In general, GCC has limited handling of fp-contract. I don’t believe it supports the STDC FP_CONTRACT pragma, and it doesn’t distinguish between fast and on. I think it only does fast, and that is effectively contract by the -std setting. So following GCC behavior is problematic, but since -f[no-]fast-math comes from GCC it kind of makes sense to have the same behavior.

To me, the strongest argument for the change you are proposing is that fp-contract is a three-state setting, while fast-math is only on or off, so there’s no clear answer as to what -fno-fast-math should do if we leave fp-contract connected to -f[no-]fast-math.

I’m not sure regressing a well-established invariant is a reasonable solution to resolve such a diagnostic.
Presumably that diagnostic should not be issued at all.

I compared the SingleSource benchmark with and without the fast-math patch. The original numbers are the exec time with clang (no changes) and the new numbers are with clang where the ffast-math option has no effect on ffp-contract (current RFC change).
There are about 1820 test cases. Among these 139 test cases have seen increase in their exec-time. 101 of them are gcc-torture tests and the 38 others are in various suites.
The gcc-torture suite has an increase varying from 20 to 175%. The rest varies between 1 and 200%.
It looks like there is no way of attaching a sheet or a word document. So attaching a png of a portion of the results.

I think most of the SingleSource binaries are mostly for testing and not really for benchmarking, so the insight from measuring them is likely limited. For benchmarking, SPEC and other benchmark suites likely will provide more reliable performance data.

I agree that changing -ffast-math to behave differently seems sub-optimal. I think it makes sense for -ffast-math to set -ffp-contract=fast from the user’s perspective and it has been doing so for a long time.

IMO if the user provides -ffast-math -fno-fast-math it would make sense to leave the original value of -ffp-contract` unchanged.

The problem isn’t so much with -ffast-math -fno-fast-math as it is with things like -ffp-contract=fast -fno-fast-math and -ffast-math -ffp-contract=fast -fno-fast-math. The current logic sets fp-contract back to ‘on’ in these cases.

I suppose we could add more complicated tracking that would allow us to determine whether fp-contract was changed by an explicit -ffp-contract option. There’s still some ambiguity, but I guess we could define a rule that would be consistent and defensible. Suppose we see -ffp-contract=fast -ffast-math -fno-fast-math. In this case, the user explicitly set fp-contract, but fast-math also set it. In that case, we should leave it at fast, right? And if the we see -ffp-contract=off -ffast-math -fno-fast-math we’d set it to off. Where it becomes less clear is a case like -ffp-contract=off -ffast-math with fast-math left on. Does the explicit fp-contract take precedence in this case? I think probably not.

That gives me a set of rules like this:

  • If -ffast-math and -ffp-contract are both seen but -fno-fast-math has not been seen after the last instance of -ffast-math, the fp-contract setting is determined by whichever was seen last
  • If -fno-fast-math is seen and -ffp-contract has been seen at least once, the last seen value of -ffp-contract is used
  • If -fno-fast-math is seen and -ffp-contract has not been seen, the default setting of fp-contract is used

I think that would give users the expected behavior and we wouldn’t need a diagnostic.

1 Like

Would everyone agree with this?

ffast-math → fp-contract=fast
ffast-math + fp-contract = val → fp-contract=val
ffast-math + fp-contract =val + fno-fast-math → fp-contract=val
ffast-math + fno-fast-math → fp-contract=on

fno-fast-math ->fp-contract=on
fno-fast-math + ffp-contract=fast->fp-contract=fast

Minor correction here – the result should be fast for Cuda and HIP.

And a couple of cases, I’d add to make sure everyone is in agreement:

fp-contract =val + ffast-math + fno-fast-math → fp-contract=val
fp-contract =val + ffast-math → fp-contract=fast

In the first case you have that -fno-fast-math does not affect the fp-contraction, while in the second case -fno-fast-math does affect the fp-contraction.
I don’t understand why the behavior of -fno-fast-math should depend on which options were listed before .
-fno-fast-math should behave like the other high level floating point options – i.e. it is just an alias for a sequence of low level options.

IMO the set low level properties that -ffast-math and -fno-fast-math affect should be the same.
Since ffast-math affect the fp-contraction property then also -fno-fast-math should affect the fp-contraction property. Given that -fno-fast-math should disable unsafe floating point optimizations then it makes sense to me that -fno-fast-math implies -ffp-contract=on – i.e. fp-contraction is set to the default value.

The issue is that disabling unsafe optimizations ca be done using fp-contract=on or fp-contract=off, which really means taking the unchanged value of fp-contract in this rule: ffast-math + fp-contract =val + fno-fast-math → fp-contract=val.
If we want to follow what you are suggesting then we would have to live with these kinds of warnings which was the main reason for this RFC.
A combination of these 2 flags -ffast-math -fno-fast-math would produce this nonsense warning for the user:
warning: overriding ‘-ffp-contract=fast’ option with ‘-ffp-contract=on’

This makes sense to me! It would be good if this would also be clearly spelled out in the docs.

It is not clear to me what was the original reason to introduce such warnings, so I don’t know if the warning being emitted when the user-provided command line has -ffast-math -fno-fast-math is intentional or not.

Maybe the wording of the warning should be improved to reference the user-provided options that implies a given low level option for clarity. E.g.:

warning: overriding '-ffp-contract=fast' option (implied by '-ffast-math') with '-ffp-contract=on' (implied by '-fno-fast-math')

Moreover I wonder if such warning should be emitted by default (perhaps it should be emitted only in pedantic or other opt-in mode) given that we should be documenting precisely which options are implied by the high level floating point options (-ffast-math/-fno-fast-math/-ffp-model= and similar).

For example in https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math it is clearly stated that -ffast-math implies -ffp-contract=fast, but there is no documentation of -fno-fast-math.

This has changed “recently” with f04e387055e495e3e14570087d68e93593cf2918. The behavior prior to that commit was such that -fno-fast-math was resetting FP contraction to the “default” value unconditionally. What was the rationale behind that change? The commit is about fixing the “default” value, so I don’t understand why that changed the way -fno-fast-math used to behave.

Can you provide more context and information about the overall goals here?

@michele.scandale Zahira is on sabbatical until the first week in August, but I have been working with her on this and can try to answer your question about the goals on these changes.

Here is the review for f04e387055e495e3e14570087d68e93593cf2918: https://reviews.llvm.org/D107994

Generally, we were trying to align things so that the fp-model option would serve as a reasonable “umbrella” option to set all floating-point related semantic modes to reasonable values for most users with a small number of options, while making -ffp-model=precise the default behavior.

There’s some ambiguity as to how fp-contract should fit in with this scheme, but given that C language standards allow FMA within a single expression (equivalent to fp-contract=on) but not across expressions (which would be enabled by fp-contract=fast), we think fp-model=precise should set fp-contract=on.

This “vision” was somewhat complicated by the fact that HIP and Cuda wanted a default behavior of fp-contract=fast. So for HIP and Cuda, fp-model=precise sets fp-contract=fast (which is also the default behavior).

Regarding your specific question about the old behavior of -fno-fast-math resetting fp-contract to the “default” value unconditionally, I think the problem is that if the user has specifically disabled contraction (-ffp-contract=off) then we don’t want -fno-fast-math to turn it on. There were many years when despite the clang documentation saying that fp-contract=on was the default, the actual implementation defaulted to fp-contract=off (at least for X86-targets), so having -fno-fast-math restore that setting wasn’t a problem. Once we made fp-contract=on (or fast for HIP and Cuda) the default behavior, the -fno-fast-math behavior needed to be more complicated.