Making ffp-model=fast more user friendly

I don’t know how many people use the -ffp-model=fast command line option, as opposed to -ffast-math, but I’d like to propose a change in the behavior of this option. Specifically, I’d like to make it a little more user friendly.

Currently, the -ffp-model=fast option is follows the behavior of -ffast-math, including the fact that the option implies -ffinite-math-only. There are people who have been very vocal about the fact that -ffast-math is too dangerous to ever use, and I think the -ffinite-math-only option is a big part of this.

I think most users would be better served using -funsafe-math-optimizations, but the name is a bit off-putting. Do you want options that are “fast” or options that are “unsafe”? Without any more information, you’d probably prefer “fast”, right? But the fact is that -ffast-math is much more unsafe than -funsafe-math-optimizations.

I’d like to propose that we make -ffp-model=fast a bit more user friendly, and maybe add something new (-ffp-model=aggressive) for people who want to keep the current behavior.

To provide a bit of backstory to this, my experience with this is through working with customers of Intel’s C and C++ compilers, both the one we are now calling the “Classic Compiler” (icc) and the new “oneAPI Compiler” (icx). These compilers have long supported the -fp-model option but have a distinction between -fp-model fast=1 and -fp-model fast=2 which more or less follows the proposal I’m making here. The oneAPI compiler initially followed the current clang implementation of -ffp-model=fast but we got a lot of feedback from our customers that the NaN handling was just too aggressive.

Initially, I’d like to make the following changes:

fast aggressive
Honor NaNs Yes No
Honor infinities Yes No
Complex Arithmetic promoted basic
contract fast-honor-pragmas fast-honor-pragmas

With the exception of the contract behavior, the “aggressive” column matches what we do today with -ffp-model=fast. I’m proposing changing the contract behavior to honor pragmas in both cases because I don’t understand why anyone wants to ignore pragmas.

One other change I’d like to include in the less aggressive version of this is not canonicalizing based on fast-math flags, but that would require changes to the optimizer. Right now the optimizer does things like (X / Y) * Z ==> (X * Z) / Y when fast-math is enabled, just because it might enable other optimizations later. This seems too aggressive to me, but there’s nothing we can do about it at the moment. I just wanted to note it here as another thing I’d like to change with the -ffp-model=fast option when I can.

My question here is, is anyone using the -ffp-model option who would object to the changes I’m proposing above?

It’s an interesting observation that reassociation is actually pretty safe in practice, despite the fact that the definition gives us free reign to do transforms that completely destroy precision. Do you have any insight into what makes it safe? Is it just an implicit contract between compiler developers and scientific computing users that the transforms are limited in certain ways?

If we want to encourage people to use this mode, it’s probably worth spending some time revisiting the actual transforms we perform. We currently check isFast() and UnsafeFPMath in a bunch of places, and some of them probably should be checking “reassoc” instead.

I don’t have any specific opinion on the command line flags, beyond agreeing they’re messy.

I think what I’d say about reassociation is just that it’s often safe, it’s generally well understood, and there is more that you can do about it. The most common case where reassociation causes trouble is when you have code like r = a - b + x where a and b are of much greater magnitude than x. If the compiler reinterprets that as r = a - (b - x) the x term might disappear completely, but I think people who do a lot of numeric work understand this and wouldn’t be surprised by it. Once you find the problem, you can work around it with the __arithmetic_fence built-in or using the -fprotect-parens option.

The nnan and ninf settings are a bit more hazardous, because you can write code like this:

float foo(float x, float y) {
  if (isnan(x) || isnan(y)) {
    // Do something about NaN
  }
  // Do something with x and y
}

And if you compile it with finite-math-only, the optimizer will just eliminate the NaN checks, and more generally the optimizer will treat any code that it can prove produces NaN as UB and potentially eliminate it. That’s allowed by the option, but I don’t think it’s what most people want. You can’t even always fix that by using a pragma like float_control(precise, on) to protect the NaN checks, because the value tracker can still deduce that a value isn’t NaN in an instruction without the nnan flag set if it sees that the flag is set in the instruction that produced the value. I’m sure you remember my complaints about that from this thread.

On X86 the no-nan problem is even more insidious because the ucomiss instruction, which is often used for equality comparisons, sets the ZF, PF, and CF flags if one of the operands is NaN, but when the nnan option is set in the IR, the X86 backend doesn’t bother checking PF and ZF, so it reports that NaN is equal to everything. That makes for faster code if you really don’t ever have NaN values, but it one shows up, it can lead to a nasty bug.

All of that is what you signed up for if you use -ffinite-math-only but I suspect many people don’t realize they are signing up for it when they use -ffp-model=fast and I’d guess that some don’t even know -ffinite-math-only is going to be interpreted that broadly.

BTW, you make a good point about checking reassoc when that all we need, rather than isFast() because the change I am proposing will cause isFast() to return false.

I’ve (finally) put up a PR to implement the proposed changes.

[Driver] Introduce ffp-model=aggressive by andykaylor · Pull Request #100453 · llvm/llvm-project (github.com)

@pinskia FYI this is a new proposed value for the existing Clang-specific option -ffp-model=.

There is a mention of -fcx-limited-range/-fcx-fortran-rules (basic complex arithmentic) but no mention of what will be done with -fexcess-precision=* which makes a difference for _Float16.

(as an aside clang’s documentation for fexcess-precision does not mention _Float16 on aarch64 without fp16 feature enabled which I think uses the same rules as x86).

@pinskia The -ffp-model option doesn’t currently modify the excess precision setting. The default setting (“standard”) seems appropriate for all fp-models except possibly strict.

I’m ambivalent about how it should be set with fp-model=strict. In general, this model is intended to allow access to the floating-point environment, and as such it enables strict exception semantics and does not assume the default rounding mode. Because the C and C++ standards allow excess precision, there’s nothing to say that the strict floating-point model should disallow it. On the other hand, I can see why a reasonable user might think that the strict model would disallow excess precision.

What is your opinion?

I haven’t gotten a lot of feedback on this. I suspect that is because there aren’t a lot of people using the -ffp-model option apart from -ffp-model=strict

If anyone has an opinion about it, please let me know here or in the PR. Otherwise, I’d like to proceed with the change.

context:global -ffp-mode… - Sourcegraph shows that the option is used in the wild, but not incredibly often. Out of the uses in those search results, do you see any situations where you think the proposed changes would be detrimental?

I’m not seeing opposition or support, mostly just questions. Based on how infrequently the option seems to be used in the wild, I think it’s reasonable to go forward with the changes, but I’d still like to hear what you think the fallout would be for the folks we know are using it.

I generally don’t think the change is detrimental anywhere. It may cause some performance regressions in some cases, but it’s difficult to predict where that will happen and in most cases the regression will be minor.

There is a use case that is important to @arsenm where a library is compiled with and without the “finite-math-only” enabled to get two versions of the library from the same source, one of which is optimized to avoid expensive checks for NaN and infinity. I don’t know if they are using -ffp-model=fast there. I believe that’s the case that is driving a lot of our very aggressive NaN and infinity elimination.

I would note that a few of the cases in your link are specifically aligned with my proposed change. For example, “ggerganov/llama.cpp arm64-windows-llvm.cmake” immediately follows -ffp-model=fast with -fno-finite-math-only and “LLVM/Umpire tests/CmakeLists.txt” has a note explaining that -ffp-model=fast has a problem with NaN compares.

The use at “easybuilders/easybuild-framework easybuild/toolchains/compiler/intel_compilers.py” seems to be directed towards the Intel compiler, not clang, as it has a comment about -ffp-model=fast=2 producing a warning, but the file seems to be trying to establish to levels of fast-math – loose and veryloose.

Thank you for the assessment! I think it’s safe to move forward with these changes; if anyone has concerns, they can be addressed on the patch discussion or post-commit.