Presenting Unsafe Math Flag to Optimizer

Hi all,

A quick question:

The current implementation of the "allow unsafe math" option is to specify it via the TargetOptions object. However, this prevents the target-independent optimizer from using it. Are there any opinions (ha!) on how this could be achieved in a nice clean manner which doesn't require using the TargetOptions object in the optimizer?

-bw

function attribute.

Not a bad idea. However, how should it behave during inlining for LTO? (I really don't know the answer to this.)

There are three options, that you mentioned off-line:

A) Caller wins
   This could result in something the programmer didn't expect, possibly resulting in an incorrect answer.

B) Don't inline
   We potentially miss important optimizations.

C) Safety first
   The programmer could get code they didn't expect, but at least it won't result in an "incorrect" answer. I.e., it will be correct modulo LLVM bugs, but lacking any unsafe transforms they were expecting.

-bw

This one sounds like the sensible option to me.

-Owen

Bill Wendling wrote:

Hi all,

A quick question:

The current implementation of the "allow unsafe math" option is to specify it via the TargetOptions object. However, this prevents the target-independent optimizer from using it. Are there any opinions (ha!) on how this could be achieved in a nice clean manner which doesn't require using the TargetOptions object in the optimizer?

function attribute.

Not a bad idea. However, how should it behave during inlining for LTO? (I really don't know the answer to this.)

A bit on the instruction, not unlike nsw/nuw/exact/inbounds. We could mark whether the fadd is reassociable or not:

   http://nondot.org/sabre/LLVMNotes/FloatingPointChanges.txt

This handles inlining properly.

Nick

Hi Bill,

The current implementation of the "allow unsafe math" option is to specify it via the TargetOptions object. However, this prevents the target-independent optimizer from using it. Are there any opinions (ha!) on how this could be achieved in a nice clean manner which doesn't require using the TargetOptions object in the optimizer?

a flag on each floating point operation, saying whether it does "exact" math or
not?

Ciao,

Duncan.

Hi-

Would this not be a good place to use the new metadata feature? If metadata indicating that a value is OK to have unsafe optimisations done on it is dropped, everything will still work correctly.

Alastair

There are three options, that you mentioned off-line:

A) Caller wins
  This could result in something the programmer didn't expect, possibly resulting in an incorrect answer.

B) Don't inline
  We potentially miss important optimizations.

C) Safety first
  The programmer could get code they didn't expect, but at least it won't result in an "incorrect" answer. I.e., it will be correct modulo LLVM bugs, but lacking any unsafe transforms they were expecting.

From having worked extensively on FP code, I would prefer option B -- often

the most important property is that two calls to the same function with the same parameters produce the same result, you don't want the function to produce different results if it has been inlined or not even if the inlined result is more 'correct' (which would be the case with C).

One example of the kind of problems you can get into is if you have a test to see which side of a plane a point is on and it produces different results from two different calls with the same point and the same plane.

- Morten

Yes, the right approach for this is to add flags to each fp operations just like the NUW/NSW bits on integer ops. We want the ability to represent the C99 pragmas which are scoped more tightly than a function body.

This is actually really easy to do, the big issue is defining the 'bits' that we want to carry on each operation. For example, I think it would be reasonable to have an "assume finite" bit (saying no nan's / inf), it would also be useful to know you can do reassociation etc, useful to know that you don't care about signed zero, etc.

I don't have enough expertise to propose exactly how this should work.

-Chris

Good point. :slight_smile: Though it looks like we would bypass this and go with flags on the individual instructions. Do you have any insight into Chris's response?

-bw

This is actually really easy to do, the big issue is defining the
'bits' that we want to carry on each operation. For example, I think
it would be reasonable to have an "assume finite" bit (saying no
nan's / inf), it would also be useful to know you can do reassociation
etc, useful to know that you don't care about signed zero, etc.

I think the main issues are:

1) special values (+0, -0, NaN, +Inf, -Inf) to be taken into account
- this can be represented with an 'assume_finite' bit and an 'assume_no_signed_zero' bit

2) rounding, the x86 FPU has 80 bits of internal precision, so you get inconsistent results depending
on intermediate results being spilled or being kept in registers. One usual way of handling this is
that any assignment in the source code will truncate to the memory representation, while intermediate
results in an expression are allowed to be kept at 80 bits precision (i.e. frontend decides which operations must be rounded).
- this can be represented with a 'exact_precision' bit

3) exceptions, you might need to have the right number of exceptions triggered in the right order so basically no optimizations are allowed.
- this can be represented with a 'trapping_math' and/or 'signaling_NaN' bit, or maybe it can be encoded as 'no_reorder' 'no_duplicate'

see:
Optimize Options - Using the GNU Compiler Collection (GCC) (look for -ffloat-store)
/fp (Specify floating-point behavior) | Microsoft Docs (Title: /fp (Specify Floating-Point Behavior))

- Morten

I think the main issues are:

  1. special values (+0, -0, NaN, +Inf, -Inf) to be taken into account
  • this can be represented with an ‘assume_finite’ bit and an ‘assume_no_signed_zero’ bit

Sounds right to me.

  1. rounding, the x86 FPU has 80 bits of internal precision, so you get inconsistent results depending
    on intermediate results being spilled or being kept in registers. One usual way of handling this is
    that any assignment in the source code will truncate to the memory representation, while intermediate
    results in an expression are allowed to be kept at 80 bits precision (i.e. frontend decides which operations must be rounded).
  • this can be represented with a ‘exact_precision’ bit

Does LLVM even support generating float and double arithmetic on x87? Certainly the default should be to use SSE/SSE2 and avoid this problem entirely. If legacy x87 codegen is supported, it would be nice to have float-store be the default behavior, and require a flag “-fnon-portable-extra-precision” or something similarly menacing to enable the other behavior.

  1. exceptions, you might need to have the right number of exceptions triggered in the right order so basically no optimizations are allowed.
  • this can be represented with a ‘trapping_math’ and/or ‘signaling_NaN’ bit, or maybe it can be encoded as ‘no_reorder’ ‘no_duplicate’

Some reordering should be inhibited not only by trapping math, but also by the default IEEE-754 exception handling (nonstop execution with status flags), at least when #pragma STDC FENV_ACCESS ON is active. If the reordering affects only the order in which flags could be raised, and not which flags could be raised, then it could be allowed with the default exception handling.

  • Steve