-fdenormal-fp-math

Can you please clarify if -fdenormal-fp-math is implied by -ffast-math? If not, shouldn't it be? Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

Thank you,

Can you please clarify if -fdenormal-fp-math is implied by -ffast-math? If not, shouldn't it be?

Why should it be?

  Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

No. Right now, I believe that flag is taken as a statement about what the compiler can assume about the FP environment. It is not something we attempt to ensure at startup. I imagine we could on some systems (as you say, by using some startup code).

  -Hal

Can you please clarify if -fdenormal-fp-math is implied by -ffast-math? If not, shouldn't it be?

Why should it be?

It'd make sense, since most targets hiccup when tripping at a denormal, unless they're commanded to flush them to zero by flipping a bit in a control register.

  Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

No. Right now, I believe that flag is taken as a statement about what the compiler can assume about the FP environment. It is not something we attempt to ensure at startup. I imagine we could on some systems (as you say, by using some startup code).

In GCC, though there's not equivalent option to -fdenormal-fp-math, IIRC, on some targets using -ffast-math causes the CRT start up code to enable flushing to zero. Effectively, on these targets, -ffast-math implies flushing denormals to zero.

Would this GCC behavior, -ffast-math implying -fdenormal-fp-math, be interesting to carry over to Clang, if not universally, on specific targets?

Thank you,

Can you please clarify if -fdenormal-fp-math is implied by -ffast-math? If not, shouldn't it be?

Why should it be?

It'd make sense, since most targets hiccup when tripping at a denormal, unless they're commanded to flush them to zero by flipping a bit in a control register.

This is a good point. There is certainly a non-trivial amount of hardware that will dump into microcode if it needs to properly process denormals.

  Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

No. Right now, I believe that flag is taken as a statement about what the compiler can assume about the FP environment. It is not something we attempt to ensure at startup. I imagine we could on some systems (as you say, by using some startup code).

In GCC, though there's not equivalent option to -fdenormal-fp-math, IIRC, on some targets using -ffast-math causes the CRT start up code to enable flushing to zero. Effectively, on these targets, -ffast-math implies flushing denormals to zero.

Okay, interesting.

Would this GCC behavior, -ffast-math implying -fdenormal-fp-math, be interesting to carry over to Clang, if not universally, on specific targets?

I think it would need to be target specific, but we could definitely let -ffast-math affect how linking is done so that you pick up some different startup files, or similar, on systems that provide such things. The startup files are generally associated with the libc implementation, however, so this might be a dual-pronged task.

If you could survey the GCC implementation and see what it does for this on various targets, that would certainly be a useful input to this discussion.

  -Hal

Hello,

Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

No. Right now, I believe that flag is taken as a statement about what the compiler can assume about the FP environment. It is not something we attempt to ensure at startup. I imagine we could on some systems (as you say, by using some startup code).

There is a PR related to this: https://bugs.llvm.org//show_bug.cgi?id=14024

clang does not set DAZ flag in -ffast-math mode

As can be seen the gcc produced binary is a lot faster than clang in -ffast-mode (besides being a little bit faster in general). This is not caused by better optimizations, but because gcc links in a small function into the resulting binary, which sets the DAZ register.

DAZ tells the CPU to force all Denormals to zero. A Denormal is a number that is so small that FPU can't renormalize it due to limited exponent ranges. They're just like normal numbers, but they take considerably longer to process. Note that not all processors support DAZ.

CU,
Jonathan

Hello,

  Does the start-up code, if any, perhaps in compiler-rt, enable the appropriate bits in the proper control register? How can I help?

No. Right now, I believe that flag is taken as a statement about what the compiler can assume about the FP environment. It is not something we attempt to ensure at startup. I imagine we could on some systems (as you say, by using some startup code).

There is a PR related to this: https://bugs.llvm.org//show_bug.cgi?id=14024

That's a good point. On some systems at least, we link in crtfastmath.o (which might enable flushing denormals) if -ffast-math (or friends) are provided when we're linking.

  -Hal

Hi, Hal.

For the benefit of all, on foil 14 in this presentation by Intel, Intel Developer Zone, the issues of underflow and denormals in IEEE754 are explained.

Here's what I found out in GCC:

On {aarch64,alpha,arm,i386,ia64,mips,sparc}, when {-Ofast,-ffast-math,-funsafe-math-optimizations} is specified, the file crtfastmath.o is added to the link line.

crtfastmath.c contains a static function with the attribute "constructor" that sets the appropriate flag in a control register to round denormals to zero. Some targets set other bits too (like Alpha and i386, also to round underflows to zero, and MIPS, also to disable exceptions).

Sun's SPARC compiler also flushes underflows and denormals to zero when the option {-fast,-fnonstd,-fns} is specified. The reason given (v. SPARC Behavior and Implementation) is also performance degradation.

Since clang has a specific flag to guide the behavior of denormals, -fdenormal-fp-math, it could be used to make sure that the FPU is set to the same specified behavior: {ieee,preserve-sign,positive-zero}. Again, specific targets might decide to imply a value other than "ieee" for -fdenormal-fp-math when -funsafe-math-optimizations is specified, either explicitly or implicitly.

Thank you,

Hi, Hal.

For the benefit of all, on foil 14 in this presentation by Intel, Intel Developer Zone, the issues of underflow and denormals in IEEE754 are explained.

Here's what I found out in GCC:

On {aarch64,alpha,arm,i386,ia64,mips,sparc}, when {-Ofast,-ffast-math,-funsafe-math-optimizations} is specified, the file crtfastmath.o is added to the link line.

crtfastmath.c contains a static function with the attribute "constructor" that sets the appropriate flag in a control register to round denormals to zero. Some targets set other bits too (like Alpha and i386, also to round underflows to zero, and MIPS, also to disable exceptions).

Sun's SPARC compiler also flushes underflows and denormals to zero when the option {-fast,-fnonstd,-fns} is specified. The reason given (v. SPARC Behavior and Implementation) is also performance degradation.

Since clang has a specific flag to guide the behavior of denormals, -fdenormal-fp-math, it could be used to make sure that the FPU is set to the same specified behavior: {ieee,preserve-sign,positive-zero}.

I don't know if we can use ctrfastmath.o for this, independent of -ffast-math, essentially because we have no control over what else it might do, but I can certainly see shipping (e.g. as part of compiler-rt) some similar files that specifically deal with denormals.

  -Hal

Hal,

Would it be feasible for clang to generate crtfastmath.o on the fly at link time, by using LLVM to emit the object file?

Thank you,

This is a good start. However, I wonder how the external dependency may be eliminated.

Thank you,

We could. We could also just inject the necessary IR into each module as a linkonce_odr initialization function. I’m not sure it is worthwhile; we depend on compiler-rt for a lot of optional features. -Hal

Hal,

Would it be feasible for clang to generate crtfastmath.o on the fly at link time, by using LLVM to emit the object file?

We could. We could also just inject the necessary IR into each module as a linkonce_odr initialization function. I'm not sure it is worthwhile; we depend on compiler-rt for a lot of optional features.

OTOH, by making sure that a module contains this code when any function is relaxing the FP semantics, even if the link command line doesn't include -ffast-math, the expected behavior can be guaranteed.

Hi, Hal.

For the benefit of all, on foil 14 in this presentation by Intel, Intel Developer Zone, the issues of underflow and denormals in IEEE754 are explained.

Here's what I found out in GCC:

On {aarch64,alpha,arm,i386,ia64,mips,sparc}, when {-Ofast,-ffast-math,-funsafe-math-optimizations} is specified, the file crtfastmath.o is added to the link line.

crtfastmath.c contains a static function with the attribute "constructor" that sets the appropriate flag in a control register to round denormals to zero. Some targets set other bits too (like Alpha and i386, also to round underflows to zero, and MIPS, also to disable exceptions).

Sun's SPARC compiler also flushes underflows and denormals to zero when the option {-fast,-fnonstd,-fns} is specified. The reason given (v. SPARC Behavior and Implementation) is also performance degradation.

Since clang has a specific flag to guide the behavior of denormals, -fdenormal-fp-math, it could be used to make sure that the FPU is set to the same specified behavior: {ieee,preserve-sign,positive-zero}.

I don't know if we can use ctrfastmath.o for this, independent of -ffast-math, essentially because we have no control over what else it might do, but I can certainly see shipping (e.g. as part of compiler-rt) some similar files that specifically deal with denormals.

-Hal

  Again, specific targets might decide to imply a value other than "ieee" for -fdenormal-fp-math when -funsafe-math-optimizations is specified, either explicitly or implicitly.

Thank you,

Thank you,

Hal,

Would it be feasible for clang to generate crtfastmath.o on the fly at link time, by using LLVM to emit the object file?

We could. We could also just inject the necessary IR into each module as a linkonce_odr initialization function. I'm not sure it is worthwhile; we depend on compiler-rt for a lot of optional features.

OTOH, by making sure that a module contains this code when any function is relaxing the FP semantics, even if the link command line doesn't include -ffast-math, the expected behavior can be guaranteed.

I'm not sure that's desirable. The problem is that these are global settings, and so we need to decide whether the flag on any translation unit will affect all others, or if we need the flag at link time for something that affects the entire application. I lean toward the latter.

  -Hal

Right; setting fast-math on one translation unit should not effect the behavior of another translation unit.

Got it.

So, right now this only works if libgcc is available. Would it be a good idea to add it to compiler-rt when it is not?

Thank you,

Hal,

Would it be feasible for clang to generate crtfastmath.o on the fly at link time, by using LLVM to emit the object file?

We could. We could also just inject the necessary IR into each module as a linkonce_odr initialization function. I'm not sure it is worthwhile; we depend on compiler-rt for a lot of optional features.

OTOH, by making sure that a module contains this code when any function is relaxing the FP semantics, even if the link command line doesn't include -ffast-math, the expected behavior can be guaranteed.

I'm not sure that's desirable. The problem is that these are global settings, and so we need to decide whether the flag on any translation unit will affect all others, or if we need the flag at link time for something that affects the entire application. I lean toward the latter.

Right; setting fast-math on one translation unit should not effect the behavior of another translation unit.

Got it.

So, right now this only works if libgcc is available. Would it be a good idea to add it to compiler-rt when it is not?

Sounds reasonable to me.

  -Hal