[RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Michael_Ilseman2 · October 29, 2012, 11:34pm

Introduction

Krzysztof_Parzyszek · October 30, 2012, 12:18am

> N: no NaNs - ignore the existence of NaNs when convenient

Maybe distinguish between quiet and signaling NaNs?

> NI - no infs AND no NaNs
> x - x ==> 0
> Inf > x ==> true

Inf * x ==> 0?

I think that if an infinity appears when NI (or I) is given, the result should be left as "undefined". Similarly with NaNs. In such cases, it's impossible to predict the accuracy of the result, so trying to define what happens is pretty much moot. In this case Inf > x may as well be simplified to "false" without any loss of (already absent) meaning.

-Krzysztof

Eli_Friedman1 · October 30, 2012, 12:30am

We already ignore the existence of signaling NaNs by default. The
proposal could make that more clear, though.

-Eli

Michael_Ilseman2 · October 30, 2012, 3:12am

N: no NaNs - ignore the existence of NaNs when convenient

Maybe distinguish between quiet and signaling NaNs?

We already ignore the existence of signaling NaNs by default. The
proposal could make that more clear, though.

Yes, the default LLVM behavior is:
* No signaling NaNs
* Default rounding mode
* FENV_ACCESS is off
I'll be more explicit from now on.

Michael_Ilseman2 · October 30, 2012, 3:22am

>
> N: no NaNs - ignore the existence of NaNs when convenient

Maybe distinguish between quiet and signaling NaNs?

> NI - no infs AND no NaNs
> x - x ==> 0
> Inf > x ==> true

Inf * x ==> 0?

I think that if an infinity appears when NI (or I) is given, the result should be left as "undefined". Similarly with NaNs. In such cases, it's impossible to predict the accuracy of the result, so trying to define what happens is pretty much moot. In this case Inf > x may as well be simplified to "false" without any loss of (already absent) meaning.

The goal is not necessarily to un-define Inf/NaN, but to opt-in to unsafe optimizations that would otherwise not be allowed to be applied, e.g. x*0==>0. There may be examples where these optimizations produce arbitrary results as though those constructs were absent in meaning, but that doesn't make Inf/NaN constants completely undefined in general. The "when convenient" wording is already a little vague/permissive, and could be re-worded to state that Values are assumed to not be Inf/NaN when convenient, but Constants may be honored.

Duncan_Sands · October 30, 2012, 8:46am

Hi Michael,

Flags
---
no NaNs (N)
   - ignore the existence of NaNs when convenient
no Infs (I)
   - ignore the existence of Infs when convenient
no signed zeros (S)
   - ignore the existence of negative zero when convenient

while the above flags make perfect sense for me, the other two seem more
dubious:

allow fusion (F)
   - fuse FP operations when convenient, despite possible differences in rounding
     (e.g. form FMAs)
unsafe algebra (A)
   - allow for algebraically equivalent transformations that may dramatically
     change results in floating point. (e.g. reassociation)

They don't seem to be capturing a clear concept, they seem more like a grab-bag
of "everything else" (A) or "here's a random thing that is important today so
let's have a flag for it" (F).

...

Why not use metadata rather than flags?

There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. These flags are analogous to nsw/nuw, and are inherent
properties of the IR instructions themselves that all transformations should
respect.

If you drop any of these flags then things are still conservatively correct,
just like with metadata. In my opinion this could be implemented as metadata.
(I'm not saying it should be represented as metadata, I'm saying it could be).

Disadvantages of metadata:

- Bloats the IR (however my measurements suggest this is by < 2% for math heavy
code)
- More painful to work with (though helper classes can mitigate this)
- Less efficient to modify (but will flags be cleared that often)?

Disadvantages of using subclass data bits:

- Can only represent flags. Thus you might end up with a mix of flags and
metadata for floating point math, with the metadata holding the non-flag
info, and subclass data holding the flags. In which case it might be better
to just have it all be metadata in the first place
- Only a limited number of bits (but hey)

Hopefully Chris will weigh in with his opinion.

Ciao, Duncan.

Krzysztof_Parzyszek · October 30, 2012, 1:48pm

The problem may be in that, in general, it may not be clear whether a given constant appears in the simplifiable computation or not. For example, if we have "x > y", and we manage to constant propagate "inf" in place of "x", we end up with "inf > y", which you suggested be folded to "true". However, as our constant propagation algorithm becomes more aggressive, it may be capable of propagating a constant into "y", which may also turn out to be "inf". This way we end up with "inf > inf". In such case, again we follow the rule of respecting constants, but now we generate "false".

Once we assume that there are no inifinities, and an infinity is actually present, the results are unpredictable.

-Krzysztof

Dan_Gohman · October 30, 2012, 3:23pm

Hi Micheal,

I
Flags

no NaNs (N)

ignore the existence of NaNs when convenient
no Infs (I)

ignore the existence of Infs when convenient
no signed zeros (S)

ignore the existence of negative zero when convenient

Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero.

Also, what does “ignore” mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended?

allow fusion (F)

fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)

What do you intend to be the relationship between this and @llvm.fmuladd? It’s not clear whether you’re trying to replace it or trying to set up an alternative for different use cases.

Is your wording of “fusing” intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid?

unsafe algebra (A)

allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)

[…]

Not all combinations make sense (e.g. ‘A’ pretty much implies all other flags).

Basically, I have the below semilattice of sensible relations:
A > S > I > N
A > F
Meaning that ‘A’ implies all the others, ‘S’ implies ‘I’ and ‘N’, etc.

Why does it make sense for S to imply I and N? GCC’s -fno-signed-zeros flag doesn’t seem to imply -ffinite-math-only, among other things. The concept of negative zero isn’t inherently linked with the concepts of infinity or NaN.

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.

N - no NaNs
x == x ==> true

This is not true if x is infinity.

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0
Inf > x ==> true

With the I flag, would the infinity as an operand make this undefined?

A - unsafe-algebra
Reassociation
(x + C1) + C2 ==> x + (C1 + C2)

Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I’m confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It’s not clear whether you meant to contradict that here.

[…]

-fp-contract=
I’m not too familiar with this option, but I recommend that ‘all’ turn on the
‘F’ bit for all FP instructinos, default do so when following the pragma, and
off never doing so. This option should still be passed to the backend.

Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the ‘I’ and ‘N’ bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the ‘S’ bit on all FP instructions.

These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences.

Dan

Dan_Gohman · October 30, 2012, 3:31pm

Oops, I was wrong here. Infinity is defined to be equal to infinity.

Dan

Michael_Ilseman2 · October 30, 2012, 4:36pm

Hi Michael,

Flags
---
no NaNs (N)
  - ignore the existence of NaNs when convenient
no Infs (I)
  - ignore the existence of Infs when convenient
no signed zeros (S)
  - ignore the existence of negative zero when convenient

while the above flags make perfect sense for me, the other two seem more
dubious:

allow fusion (F)
  - fuse FP operations when convenient, despite possible differences in rounding
    (e.g. form FMAs)
unsafe algebra (A)
  - allow for algebraically equivalent transformations that may dramatically
    change results in floating point. (e.g. reassociation)

They don't seem to be capturing a clear concept, they seem more like a grab-bag
of "everything else" (A) or "here's a random thing that is important today so
let's have a flag for it" (F).

'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines.

Why is 'F' such a random flag to have? 'F' implies ignoring intermediate rounding when a more efficient version exists, and it seems fair for it to be its own category.

...

Why not use metadata rather than flags?

There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. These flags are analogous to nsw/nuw, and are inherent
properties of the IR instructions themselves that all transformations should
respect.

If you drop any of these flags then things are still conservatively correct,
just like with metadata. In my opinion this could be implemented as metadata.
(I'm not saying it should be represented as metadata, I'm saying it could be).

Disadvantages of metadata:

- Bloats the IR (however my measurements suggest this is by < 2% for math heavy
code)
- More painful to work with (though helper classes can mitigate this)
- Less efficient to modify (but will flags be cleared that often)?

Disadvantages of using subclass data bits:

- Can only represent flags. Thus you might end up with a mix of flags and
metadata for floating point math, with the metadata holding the non-flag
info, and subclass data holding the flags. In which case it might be better
to just have it all be metadata in the first place
- Only a limited number of bits (but hey)

Hopefully Chris will weigh in with his opinion.

Ciao, Duncan.
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Thanks for the feedback!

Michael_Ilseman2 · October 30, 2012, 5:18pm

Hi Micheal,

I
Flags

no NaNs (N)

ignore the existence of NaNs when convenient
no Infs (I)

ignore the existence of Infs when convenient
no signed zeros (S)

ignore the existence of negative zero when convenient

Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero.

I wrote this thinking both, though I could certainly imagine it being clearer if defined as operands. The example optimizations section is written along the lines of ignoring both.

Also, what does “ignore” mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended?

What I’m intending is for optimizations to be allowed to ignore the possibility of those values. Thinking about it more, this is pretty vague. With your and Krzysztof’s feedback in mind, I think something along the lines of:

no NaNs (N)

The operands’ values can be assumed to be non-NaN by the optimizer. The result of this operator is Undef if passed a NaN.

Might be more clear. I’ll think about that more and revise the examples section too.

allow fusion (F)

fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)

What do you intend to be the relationship between this and @llvm.fmuladd? It’s not clear whether you’re trying to replace it or trying to set up an alternative for different use cases.

Interesting, I had not seen llvm.fmuladd. I’ll have to think about this more; perhaps fmuladd can already provide what I was intending here.

Is your wording of “fusing” intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid?

My intention is that increased precision is also valid, though I haven’t though too deeply about the difference

unsafe algebra (A)

allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)

[…]

Not all combinations make sense (e.g. ‘A’ pretty much implies all other flags).

Basically, I have the below semilattice of sensible relations:
A > S > I > N
A > F
Meaning that ‘A’ implies all the others, ‘S’ implies ‘I’ and ‘N’, etc.

Why does it make sense for S to imply I and N? GCC’s -fno-signed-zeros flag doesn’t seem to imply -ffinite-math-only, among other things. The concept of negative zero isn’t inherently linked with the concepts of infinity or NaN.

What I mean here is that I’m finding it hard to think of a case where a user would desire to specify ‘I’ and not specify ‘N’. This is more so a question I had as to whether we could/should express this as a fast-math level rather than allow each flag to be individually toggle-able. Any thoughts on this?

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.

N - no NaNs
x == x ==> true

This is not true if x is infinity.

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0
Inf > x ==> true

With the I flag, would the infinity as an operand make this undefined?

I’ll think about this more with regards to the prior changes.

A - unsafe-algebra
Reassociation
(x + C1) + C2 ==> x + (C1 + C2)

Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I’m confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It’s not clear whether you meant to contradict that here.

Reassociation is still valid. These examples are just cases where there would be a clear optimization benefit to be had. I’ll probably add in a general expression to clarify.

[…]

-fp-contract=
I’m not too familiar with this option, but I recommend that ‘all’ turn on the
‘F’ bit for all FP instructinos, default do so when following the pragma, and
off never doing so. This option should still be passed to the backend.

Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.

I will, thanks.

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the ‘I’ and ‘N’ bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the ‘S’ bit on all FP instructions.

These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences.

I don’t know of any differences, but I’ll have to look into GCC’s behavior more.

Dan

Thanks for the feedback!

Michael_Ilseman2 · October 30, 2012, 9:25pm

Here's a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan.

Revision 1 changes:
  * Removed Fusion flag from all sections
  * Clarified and changed descriptions of remaining flags:
    * Make 'N' and 'I' flags be explicitly concerning values of operands, and
      producing undef values if a NaN/Inf is provided.
    * 'S' is now only about distinguishing between +/-0.
    * LangRef changes updated to reflect flags changes
    * Updated Quesiton section given the now simpler set of flags
    * Optimizations changed to reflect 'N' and 'I' describing operands and not
      results
  * Be explicit on what LLVM's default behavior is (no signaling NaNs, etc)
  * Mention that this could be solved with metadata, and open the debate

Introduction

Evan_Cheng1 · October 30, 2012, 10:11pm

Hi Michael,

Flags
---
no NaNs (N)
  - ignore the existence of NaNs when convenient
no Infs (I)
  - ignore the existence of Infs when convenient
no signed zeros (S)
  - ignore the existence of negative zero when convenient

while the above flags make perfect sense for me, the other two seem more
dubious:

allow fusion (F)
  - fuse FP operations when convenient, despite possible differences in rounding
    (e.g. form FMAs)
unsafe algebra (A)
  - allow for algebraically equivalent transformations that may dramatically
    change results in floating point. (e.g. reassociation)

They don't seem to be capturing a clear concept, they seem more like a grab-bag
of "everything else" (A) or "here's a random thing that is important today so
let's have a flag for it" (F).

...

Why not use metadata rather than flags?

There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. These flags are analogous to nsw/nuw, and are inherent
properties of the IR instructions themselves that all transformations should
respect.

If you drop any of these flags then things are still conservatively correct,
just like with metadata. In my opinion this could be implemented as metadata.
(I'm not saying it should be represented as metadata, I'm saying it could be).

Disadvantages of metadata:

- Bloats the IR (however my measurements suggest this is by < 2% for math heavy
code)
- More painful to work with (though helper classes can mitigate this)
- Less efficient to modify (but will flags be cleared that often)?

Disadvantages of using subclass data bits:

- Can only represent flags. Thus you might end up with a mix of flags and
metadata for floating point math, with the metadata holding the non-flag
info, and subclass data holding the flags. In which case it might be better
to just have it all be metadata in the first place
- Only a limited number of bits (but hey)

Hopefully Chris will weigh in with his opinion.

FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata.

Evan

Evan_Cheng1 · October 30, 2012, 10:13pm

Hi Michael,

Flags
---
no NaNs (N)
- ignore the existence of NaNs when convenient
no Infs (I)
- ignore the existence of Infs when convenient
no signed zeros (S)
- ignore the existence of negative zero when convenient

while the above flags make perfect sense for me, the other two seem more
dubious:

allow fusion (F)
- fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)
unsafe algebra (A)
- allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)

They don't seem to be capturing a clear concept, they seem more like a grab-bag
of "everything else" (A) or "here's a random thing that is important today so
let's have a flag for it" (F).

'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines.

There is cost in modeling property. Unless there are uses for the individual fine grained properties, we shouldn't go overboard.

Evan

Dan_Gohman · October 30, 2012, 11:19pm

Here’s a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan.

Revision 1 changes:

Removed Fusion flag from all sections

Clarified and changed descriptions of remaining flags:

Make ‘N’ and ‘I’ flags be explicitly concerning values of operands, and
producing undef values if a NaN/Inf is provided.

‘S’ is now only about distinguishing between +/-0.

LangRef changes updated to reflect flags changes

Updated Quesiton section given the now simpler set of flags

Optimizations changed to reflect ‘N’ and ‘I’ describing operands and not
results

Be explicit on what LLVM’s default behavior is (no signaling NaNs, etc)

Mention that this could be solved with metadata, and open the debate

Introduction

LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.

What this doesn’t address

Default behavior is retained, and this proposal is only addressing relaxing

restrictions. LLVM currently by default:

ignores signaling NaNs

assumes default rounding mode

assumes FENV_ACCESS is off

Discussion on changing the default behavior of LLVM or allowing for more
restrictive behavior is outside the scope of this proposal. This proposal does
not address behavior of denormals, which is more of a backend concern.

Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.

This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.

Flags

no NaNs (N)

The optimizer is allowed to optimize under the assumption that the operands’
values are not NaN. If one of the operands is NaN, the value of the result
is undefined.

no Infs (I)

The optimizer is allowed to optimize under the assumption that the operands’
values are not +/-Inf. If one of the operands is +/-Inf, the value of the
result is undefined.

no signed zeros (S)

The optimizer is allowed to not distinguish between -0 and +0 for the
purposes of optimizations.

Ok, I checked LLVM CodeGen’s existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC’s -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here?

Phrasing these from the perspective of the optimizer is a little confusing here. Also, “The optimizer is allowed to [not care about X]” read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don’t. Also, even when you do have the second sentence, it seems to contradict the first sentence.

unsafe algebra (A)

The optimizer is allowed to perform algebraically equivalent transformations

that may dramatically change results in floating point. (e.g.
reassociation)

Throughout I’ll refer to these options in their short-hand, e.g. ‘A’.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. ‘A’ pretty much implies all other flags).

Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that ‘A’ implies all the others, ‘S’ implies ‘N’, etc.

Why does S still imply N?

Also, I’m curious if there’s a specific motivation to have I imply N. LLVM CodeGen’s existing options for these are independent.

It might be desirable to simplify this into just being a fast-math level.

What would make this desirable?

Changes to optimizations

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
x == x ==> true

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NIS - no signed zeros AND no NaNs AND no Infs

x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0

A - unsafe-algebra
Reassociation
(x + y) + z ==> x + (y + z)

(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I’m still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it’s not clear what you’re trying to say about denormal values here.

Dan

Michael_Ilseman2 · October 31, 2012, 3:28am

Here’s a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan.

Revision 1 changes:

Removed Fusion flag from all sections

Clarified and changed descriptions of remaining flags:

Make ‘N’ and ‘I’ flags be explicitly concerning values of operands, and
producing undef values if a NaN/Inf is provided.

‘S’ is now only about distinguishing between +/-0.

LangRef changes updated to reflect flags changes

Updated Quesiton section given the now simpler set of flags

Optimizations changed to reflect ‘N’ and ‘I’ describing operands and not
results

Be explicit on what LLVM’s default behavior is (no signaling NaNs, etc)

Mention that this could be solved with metadata, and open the debate

Introduction

LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.

What this doesn’t address

Default behavior is retained, and this proposal is only addressing relaxing

restrictions. LLVM currently by default:

ignores signaling NaNs

assumes default rounding mode

assumes FENV_ACCESS is off

Discussion on changing the default behavior of LLVM or allowing for more
restrictive behavior is outside the scope of this proposal. This proposal does
not address behavior of denormals, which is more of a backend concern.

Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.

This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.

Flags

no NaNs (N)

The optimizer is allowed to optimize under the assumption that the operands’
values are not NaN. If one of the operands is NaN, the value of the result
is undefined.

no Infs (I)

The optimizer is allowed to optimize under the assumption that the operands’
values are not +/-Inf. If one of the operands is +/-Inf, the value of the
result is undefined.

no signed zeros (S)

The optimizer is allowed to not distinguish between -0 and +0 for the
purposes of optimizations.

Ok, I checked LLVM CodeGen’s existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC’s -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here?

The primary example I was trying to simplify with that change was x * 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs and N outputs. This is because Inf * 0 is NaN. In hindsight, this is all making things more confusing, so I think I’ll go back to “arguments and results” and allow this optimization for NS. GCC gets around this by lumping Inf and NaN under the same command line option.

Phrasing these from the perspective of the optimizer is a little confusing here.

I think it might be clearer to change “The optimizer is allowed to …” to “Allow optimizations to …” and clean up the wording a bit.

Also, “The optimizer is allowed to [not care about X]” read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don’t.

‘S’ shouldn’t have any undefined behavior, it just allows optimizations to not distinguish between +/-0. It’s perfectly legal for the operation to receive a negative zero, the operation just might treat it exactly the same as a positive zero. I would rather have that than undefined behavior.

This is similar to how gcc defines -fno-signed-zeros:
“Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn’t significant.”

I’ll revise my description to also mention that the sign of a zero result isn’t significant.

Also, even when you do have the second sentence, it seems to contradict the first sentence.

Why does it contradict the first sentence? I meant it as a clarification or reinforcement of the first, not a contradiction.

unsafe algebra (A)

The optimizer is allowed to perform algebraically equivalent transformations

that may dramatically change results in floating point. (e.g.
reassociation)

Throughout I’ll refer to these options in their short-hand, e.g. ‘A’.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. ‘A’ pretty much implies all other flags).

Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that ‘A’ implies all the others, ‘S’ implies ‘N’, etc.

Why does S still imply N?

Also, I’m curious if there’s a specific motivation to have I imply N. LLVM CodeGen’s existing options for these are independent.

It might be desirable to simplify this into just being a fast-math level.

What would make this desirable?

I think this “Question” I had no longer makes too much sense, so I’m going to delete this section.

Changes to optimizations

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
x == x ==> true

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NIS - no signed zeros AND no NaNs AND no Infs

x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0

A - unsafe-algebra
Reassociation
(x + y) + z ==> x + (y + z)

(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I’m still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it’s not clear what you’re trying to say about denormal values here.

I was mentioning denormals for one of the optimizations. I think it would be more clear to say something like:

Reciprocal
x / C ==> x * (1/C) when (1/C) is not denormal

I was mostly trying to say that the optimizations are not blindly applied, but are applied when they are still legal. I think the sentence is more confusing than helpful, though.

Dan

Thanks!

jcranmer · October 31, 2012, 4:11am

I’m not an expert in writing specifications, but I think defining the S flag in this manner would be preferable: no signed zeros (S) - If present, then the result of a floating point operation with -0.0 or +0.0 as an operand is either the result of the operation with the original specified values or the result of the operation with the +0.0 or -0.0 replaced with its opposite sign. As a side note, it’s never explicitly stated in the language reference how much of IEEE 754 semantics floating point operations must follow.

Chris_Lattner · October 31, 2012, 5:50am

More specifically, I reviewed the proposal and I agree with it's general design: I think it makes sense to use subclass data for these bits even though fpprecision doesn't. It follows the analogy of NSW/NUW bits which have worked well. I also think it makes a lot of sense to separate out the "relaxing FP math" part of the FP problem from orthogonal issues like modeling rounding modes, trapping operations (SNANs), etc.

That said, I agree that the individual proposed bits (e.g. "A") could use some refinement. I think it is really important to accurately model the concepts that GCC exposes, but it may make sense to decompose them into finer-grained concepts than what GCC exposes. Also, infer-ability is an important aspect of this: we already have stuff in LLVM that tries to figure out things like "this can never be negative zero". I'd like it if we can separate the inference of this property from the clients of it.

At a (ridiculous) limit, we could take everything in "A" and see what optimizations we want to permit, and add a separate bit for every suboptimization that it would enable. Hopefully from that list we can find natural clusters that would make sense to group together.

-Chris

Krzysztof_Parzyszek · October 31, 2012, 2:01pm

For reference (or ideas), here's how the IBM XL compiler breaks down the floating point options (look for -qstrict if link doesn't take you there):

http://www.spec.org/cpu2006/flags/IBM-XL.20110613.html#user_F-qstrict

-Krzysztof

Dan_Gohman · November 1, 2012, 10:08pm

This is similar to how gcc defines -fno-signed-zeros:
“Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn’t significant.”

I’ll revise my description to also mention that the sign of a zero result isn’t significant.

Ok, I see what you’re saying here now.

Also, even when you do have the second sentence, it seems to contradict the first sentence.

Why does it contradict the first sentence? I meant it as a clarification or reinforcement of the first, not a contradiction.

Suppose I’m writing a backend for a target which has an instruction that traps on any kind of NaN. Assuming I care about NaNs, I can’t use such an instruction for regular floating-point operations. However, would it be ok to use it when the N flag is set?

If the “optimizer” may truly ignore the possibility of NaNs under the N flag, this would seem to be ok. However, a trap is outside the boundaries of “undefined result”. So, which half is right?

Dan

Topic		Replies	Views
[PATCH] fast-math patches! LLVM Dev List Archives	11	96	November 16, 2012
Representing -ffast-math at the IR level LLVM Dev List Archives	35	208	April 16, 2012
Propogation of fpclass assumptions vis a vis fast-math flags IR & Optimizations	28	711	February 3, 2024
RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags LLVM Dev List Archives	39	238	November 23, 2016
Trouble when suppressing a portion of fast-math-transformations LLVM Dev List Archives	16	169	October 10, 2017

[RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

I
Flags

I
Flags

Introduction

What this doesn’t address

Flags

Changes to optimizations

Introduction

What this doesn’t address

Flags

Changes to optimizations

[RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

I Flags

I Flags

Introduction

What this doesn’t address

Flags

Changes to optimizations

Introduction

What this doesn’t address

Flags

Changes to optimizations

Related topics

I
Flags

I
Flags