My proposal is to follow how Clang treats -Ofast, leaving out features we don’t have yet.
-Ofast will decompose into -O3 -ffast-math.
-ffast-math will mean the following:
-fno-honor-infinities
-fno-honor-nans
-fassociative-math
-freciprocal-math
-fapprox-func
-fno-signed-zeros
-ffp-contract=fast
-ftrapping-math and -frounding-math are left out because there is not yet any support for lowering these. Clang replaces built in operations (e.g. division) with llvm.experimental.constrained.* and adds the strictfp function attribute to LLVM IR. These flags are not supported by clang for aarch64.
-fno-math-errno is also left out because it seems not to be relevant to Fortran. Clang implements this flag by calling LLVM intrinsics instead of libm functions. Flang already uses the LLVM intrinsics by default.
GCC translates -Ofast to -O3 -ffast-math -fstack-arrays -fno-semantic-interposition.. Flang already allocates arrays on the stack by default. We could support -fno-semantic-interposition to match gfortran/gcc, but Clang does not include this in -Ofast.
For -ffast-math, GCC sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range, -fexcess-precision=fast. -funsafe-math-optimizations expands to fno-signed-zeros, -fno-trapping-math, -fassociative-math, -freciprocal-math. -ffinite-math-only expands to -fno-honor-infinities and -fno-honor-nans. So in all in all that is:
-fno-math-errno (see comment above)
-fno-signed-zeros (included above)
-fno-trapping-math (see comment above)
-fassociative-math (included above)
-freciprocal-math (included above)
-fno-honor-infinities (included above)
-fno-honor-nans (included above)
-fno-rounding-math (see comment above)
-fno-signaling-nans (TODO: I will look into this)
-fcx-limited-range (TODO: I will look into this)
-fexcess-precision=fast (TODO: I will look into this)
As someone involved with shipping dynamic libraries, -Ofast is really problematic because it causes an infectious change in all other libraries in that environment.
This blog post goes into the problem at length, the prime one being:
when -ffast-math is enabled, the compiler will link in a constructor that sets the FTZ/DAZ flags whenever the library is loaded — even on shared libraries, which means that any application that loads that library will have its floating point behavior changed
What makes the situation way, way worse is that -Ofast (it’s gonna make my code faster, right?) turns on -ffast-math unconditionally (as in: -fno-fast-math has no effect).
I can sort-of understand that if -Ofast’s decomposition is as simple as -O3 -ffast-math (because, once you understand that, why not then just use -O3?), but given the virality of the FTZ/DAZ flags, I’d really like to be able to just add-fno-fast-math to the flags for any library I’m building for distribution, and have that be honoured even for libraries which happen to set -Ofast. This would make the resolution of this issue much more feasible for us.
Since the post is titled “The meaning of -Ofast” I just wanted to add the concern that it should not mean viral-unsafe-math-in-shared-libs-even-when-you-try-to-turn-it-off-explicitly.
Thanks! I just wanted to check because I didn’t even realise the transitive problems with -ffast-math until that original blog post, so wanted to know if I was missing another hole too.
We discussed this during the Flang technical call today. There were different opinions on how Ofast should be modelled in Flang.
@szakharin said that the proposed approach of matching Clang with O3 + -ffast-math is a reasonable starting point. And since we are using the LLVM infrastructure it will be good if the representation at the LLVM IR level is the same for all frontends.
@AlexisPerry said that a Fortran developer will expect the behaviour to match their expectation based on usage of other Fortran Compilers. @sscalpone agreed.
@kiranchandramohan asked what would be the difference is expectation. Is it additional passes that are run? Or is it something related to the handling of fast-math?
There was some discussion on what other transformations will other compilers enable at -Ofast.
→ Nvidia and gfortran compilers enable stack-arrays. @jeanPerier said that in some instances Flang does not use stack, like for array assignment.
→ The Intel compiler enable target specific code generation at Ofast.
→ Some compilers might enable IPO at Ofast.
Clang’s default state is similar to (inter-procedural optimizations apply) but not exactly equivalent (guaranteed suppressed GOT/PLT for some references). See -fno-semantic-interposition | MaskRay
Minor correction about Intel compiler: I mixed -fast with -Ofast - the former enables -xHOST and the latter is not.
FWIW, gcc-12.2.0 also enables-fallow-store-data-races under -Ofast:
-Ofast
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.
It seems that clang does not support anything like -fallow-store-data-races now.
-fsignaling-nans this requires -ftrapping-math which will not be supported for now as noted above
-fcx-limited-range loosens a C standard library requirement and so is not relevant here
-fexcess-precision=fast is also only relevant to C
-fallow-store-data-races: as mentioned by @szakharin, clang doesn’t support this and on a quick search I couldn’t find any relevant LLVM attributes so presumably these optimizations are not implemented.
Thanks @jeanPerier for the clarification about stack-arrays. This will need to be fixed. Would it be acceptable for this to be a TODO to land after -Ofast and -ffast-math?
Thanks @tblah for comparing the options with gfortran. I think the one option that you probably missed is regarding the handling of subnormal numbers (See Optimizations enabled by -ffast-math – Krister Walfridsson's blog – Compilers, programming languages, etc.). These are small numbers which are expensive to compute. I believe gfortran and clang handles these by linking in a global static constructor that runs before main and disables subnormals (sets to zero). See Clang Compiler User’s Manual — Clang 18.0.0git documentation. Classic-Flang based compilers and nvfortran has options Mflushz, Mdaz to control this and this flag is added as part of -fast. I think we should clarify what is the situation regarding this in Flang.
BTW, I just wrote a proposal for Clang to treat -Ofast as -O3, and not imply -ffast-math. If that is implemented, we may want to have Flang do the same.