RFC: The meaning of -Ofast

My proposal is to follow how Clang treats -Ofast, leaving out features we don’t have yet.

-Ofast will decompose into -O3 -ffast-math.

-ffast-math will mean the following:

  • -fno-honor-infinities
  • -fno-honor-nans
  • -fassociative-math
  • -freciprocal-math
  • -fapprox-func
  • -fno-signed-zeros
  • -ffp-contract=fast

-ftrapping-math and -frounding-math are left out because there is not yet any support for lowering these. Clang replaces built in operations (e.g. division) with llvm.experimental.constrained.* and adds the strictfp function attribute to LLVM IR. These flags are not supported by clang for aarch64.

-fno-math-errno is also left out because it seems not to be relevant to Fortran. Clang implements this flag by calling LLVM intrinsics instead of libm functions. Flang already uses the LLVM intrinsics by default.

For optimization flags, compatibility with other Fortran compilers’ interpretations is perhaps more important than compatibility with clang’s.

1 Like

GCC translates -Ofast to -O3 -ffast-math -fstack-arrays -fno-semantic-interposition.. Flang already allocates arrays on the stack by default. We could support -fno-semantic-interposition to match gfortran/gcc, but Clang does not include this in -Ofast.

For -ffast-math, GCC sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range, -fexcess-precision=fast. -funsafe-math-optimizations expands to fno-signed-zeros, -fno-trapping-math, -fassociative-math, -freciprocal-math. -ffinite-math-only expands to -fno-honor-infinities and -fno-honor-nans. So in all in all that is:

  • -fno-math-errno (see comment above)
  • -fno-signed-zeros (included above)
  • -fno-trapping-math (see comment above)
  • -fassociative-math (included above)
  • -freciprocal-math (included above)
  • -fno-honor-infinities (included above)
  • -fno-honor-nans (included above)
  • -fno-rounding-math (see comment above)
  • -fno-signaling-nans (TODO: I will look into this)
  • -fcx-limited-range (TODO: I will look into this)
  • -fexcess-precision=fast (TODO: I will look into this)

The options I chose also correspond to ⚙ D126305 [mlir][arith] Initial support for fastmath flag attributes in the Arithmetic dialect (v2)

def FASTMATH_FAST            : I32BitEnumAttrCaseGroup<
    "fast",
    [
      FASTMATH_REASSOC,         FASTMATH_NO_NANS,     FASTMATH_NO_INFS,
      FASTMATH_NO_SIGNED_ZEROS, FASTMATH_ALLOW_RECIP, FASTMATH_ALLOW_CONTRACT,
      FASTMATH_APPROX_FUNC]>;

I believe it is the default mode in clang, which is why it does not need to be specified in Ofast

1 Like

As someone involved with shipping dynamic libraries, -Ofast is really problematic because it causes an infectious change in all other libraries in that environment.

This blog post goes into the problem at length, the prime one being:

when -ffast-math is enabled, the compiler will link in a constructor that sets the FTZ/DAZ flags whenever the library is loaded — even on shared libraries, which means that any application that loads that library will have its floating point behavior changed

What makes the situation way, way worse is that -Ofast (it’s gonna make my code faster, right?) turns on -ffast-math unconditionally (as in: -fno-fast-math has no effect).

I can sort-of understand that if -Ofast’s decomposition is as simple as -O3 -ffast-math (because, once you understand that, why not then just use -O3?), but given the virality of the FTZ/DAZ flags, I’d really like to be able to just add -fno-fast-math to the flags for any library I’m building for distribution, and have that be honoured even for libraries which happen to set -Ofast. This would make the resolution of this issue much more feasible for us.

It’s just -ffast-math with this problematic property, right? Or is there something else that -Ofast normally turns on which is problematic?

FWIW, GCC is going to make a change as a result of the attention from that post which should reduce the harm wrt shared libraries: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522.

Yes (at least, to my knowledge). :slight_smile:

Since the post is titled “The meaning of -Ofast” I just wanted to add the concern that it should not mean viral-unsafe-math-in-shared-libs-even-when-you-try-to-turn-it-off-explicitly.

I hadn’t seen Shared library compiled with -ffast-math modifies FPU state · Issue #57589 · llvm/llvm-project · GitHub before, thanks also for the update w.r.t. GCC.

1 Like

Thanks! I just wanted to check because I didn’t even realise the transitive problems with -ffast-math until that original blog post, so wanted to know if I was missing another hole too.

No problem :slight_smile:

We discussed this during the Flang technical call today. There were different opinions on how Ofast should be modelled in Flang.

@szakharin said that the proposed approach of matching Clang with O3 + -ffast-math is a reasonable starting point. And since we are using the LLVM infrastructure it will be good if the representation at the LLVM IR level is the same for all frontends.

@AlexisPerry said that a Fortran developer will expect the behaviour to match their expectation based on usage of other Fortran Compilers. @sscalpone agreed.

@kiranchandramohan asked what would be the difference is expectation. Is it additional passes that are run? Or is it something related to the handling of fast-math?

There was some discussion on what other transformations will other compilers enable at -Ofast.
→ Nvidia and gfortran compilers enable stack-arrays. @jeanPerier said that in some instances Flang does not use stack, like for array assignment.
→ The Intel compiler enable target specific code generation at Ofast.
→ Some compilers might enable IPO at Ofast.

Clang’s default state is similar to (inter-procedural optimizations apply) but not exactly equivalent (guaranteed suppressed GOT/PLT for some references). See -fno-semantic-interposition | MaskRay

Minor correction about Intel compiler: I mixed -fast with -Ofast - the former enables -xHOST and the latter is not.

FWIW, gcc-12.2.0 also enables -fallow-store-data-races under -Ofast:

-Ofast

Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.

It seems that clang does not support anything like -fallow-store-data-races now.

I have posted in fortran-lang discourse (What Ofast means for Fortran developers - Fortran Discourse) to check what Ofast means for Fortran developers.

Thanks for the feedback @h-vetinari, I will add a test to ensure flang supports -Ofast -fno-fast-math.

1 Like

As for the other flags gfortran enables:

  • -fsignaling-nans this requires -ftrapping-math which will not be supported for now as noted above
  • -fcx-limited-range loosens a C standard library requirement and so is not relevant here
  • -fexcess-precision=fast is also only relevant to C
  • -fallow-store-data-races: as mentioned by @szakharin, clang doesn’t support this and on a quick search I couldn’t find any relevant LLVM attributes so presumably these optimizations are not implemented.

Thanks @jeanPerier for the clarification about stack-arrays. This will need to be fixed. Would it be acceptable for this to be a TODO to land after -Ofast and -ffast-math?

I put a patch implementing this up for review ⚙ D138675 [flang] Add -ffast-math and -Ofast

Thanks @tblah for comparing the options with gfortran. I think the one option that you probably missed is regarding the handling of subnormal numbers (See Optimizations enabled by -ffast-math – Krister Walfridsson's blog – Compilers, programming languages, etc.). These are small numbers which are expensive to compute. I believe gfortran and clang handles these by linking in a global static constructor that runs before main and disables subnormals (sets to zero). See Clang Compiler User’s Manual — Clang 18.0.0git documentation. Classic-Flang based compilers and nvfortran has options Mflushz, Mdaz to control this and this flag is added as part of -fast. I think we should clarify what is the situation regarding this in Flang.

I think this should be OK in the spirit of making progress. Please update the status in this ticket [Flang] Add support for `-fstack-arrays` option · Issue #59231 · llvm/llvm-project · GitHub. Open a separate RFC for this topic if required.

Since this was requested, please summarise a comparison with gfortran Ofast and nvfortran -fast in a page in the flang/docs directory.

nvfortran desribes -fast in the following page: HPC Compilers User's Guide Version 23.11 for ARM, OpenPower, x86.

Thanks @kiranchandramohan. It turns out we do link to crtfastmath.o when -ffast-math is specified, the same as clang. I have added a test for this.

I also added a comparisons with gfortran and nvfortran in documentation in the patch.

1 Like

BTW, I just wrote a proposal for Clang to treat -Ofast as -O3, and not imply -ffast-math. If that is implemented, we may want to have Flang do the same.