RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

Hi all,

This is about https://reviews.llvm.org/D26708

Currently when the command-line switch ‘-ffast-math’ is specified, the

IR-level fast-math-flag ‘fast’ gets attached to appropriate FP math

instructions. That flag acts as an “umbrella” to implicitly turn on all the

other fast-math-flags (‘nnan’, ‘ninf’, ‘nsz’ and ‘arcp’):

http://llvm.org/docs/LangRef.html#fast-math-flags

This approach has the shortcoming that when there is a desire to disable one

of these fast-math-flags, if the ‘fast’ flag remains, it implicitly

re-enables the one being disabled. For example, compiling this test-case:

extern void use(float x, float y);

void test(float a, float b, float c) {

float q1 = a / c;

float q2 = b / c;

use(q1, q2);

}

at ‘-O2 -ffast-math’ does a reciprocal-transformation, so only one division

is done (as desired with fast-math). Compiling it with:

-O2 -ffast-math -fno-reciprocal-math

should disable the reciprocal transformations (the flag ‘arcp’), but leave

all the other fast-math transformations enabled. The current implementation

doesn’t do that, since the ‘fast’ IR-level flag still gets set.

Motivation of this discussion: https://llvm.org/bugs/show_bug.cgi?id=27372#c2

As an aside, when ‘-ffast-math’ is specified on the command-line, the

following six switches are all passed to cc1:

-menable-no-infs

-menable-no-nans

-fno-signed-zeros

-freciprocal-math

-fno-trapping-math

-ffp-contract=fast

and ‘-ffast-math’ itself is also passed cc1 (the act of passing ‘-ffast-math’

to cc1 results in the macro ‘FAST_MATH’ being defined). When (for

example) ‘-fno-reciprocal-math’ is passed in addition to ‘-ffast-math’, then

‘-freciprocal-math’ is no longer passed to cc1 (and the other five listed

above still are passed, along with ‘-ffast-math’ still being passed). It

seems like the intention was that these individual switches were to enable

the individual floating-point transformations (and so the lack of any of

those switches would suppress the relevant transformations), but the

‘-ffast-math’ “umbrella” is over-riding the attempted suppression.

The change proposed at https://reviews.llvm.org/D26708 deals with this issue

just for the reciprocal-transformation case, but it changes the semantics of

the ‘fast’ IR-level flag so that it no longer implies all the others. With

that proposed approach, rather than an “umbrella” flag such as ‘fast’ being

checked in the back-end (along with an individual flag like ‘arcp’), just

checking the individual flag (‘arcp’) would be done. Any fast-math-related

transformation that doesn’t have an individual flag (e.g., re-association

currently doesn’t), should eventually have an individual flag defined for

it, and then that individual flag should be checked.

What do people think?

Thanks,

-Warren

Hi,

Hi all,

This is about https://reviews.llvm.org/D26708

Currently when the command-line switch '-ffast-math' is specified, the
IR-level fast-math-flag 'fast' gets attached to appropriate FP math
instructions. That flag acts as an "umbrella" to implicitly turn on all the
other fast-math-flags ('nnan', 'ninf', 'nsz' and 'arcp'):

http://llvm.org/docs/LangRef.html#fast-math-flags

This approach has the shortcoming that when there is a desire to disable one
of these fast-math-flags, if the 'fast' flag remains, it implicitly
re-enables the one being disabled. For example, compiling this test-case:

    extern void use(float x, float y);
    void test(float a, float b, float c) {
      float q1 = a / c;
      float q2 = b / c;
      use(q1, q2);
    }

at '-O2 -ffast-math' does a reciprocal-transformation, so only one division
is done (as desired with fast-math). Compiling it with:

  -O2 -ffast-math -fno-reciprocal-math

should disable the reciprocal transformations (the flag 'arcp'), but leave
all the other fast-math transformations enabled. The current implementation
doesn't do that, since the 'fast' IR-level flag still gets set.

Motivation of this discussion: https://llvm.org/bugs/show_bug.cgi?id=27372#c2

As an aside, when '-ffast-math' is specified on the command-line, the
following six switches are all passed to cc1:

    -menable-no-infs
    -menable-no-nans
    -fno-signed-zeros
    -freciprocal-math
    -fno-trapping-math
    -ffp-contract=fast

and '-ffast-math' itself is also passed cc1 (the act of passing '-ffast-math'
to cc1 results in the macro '__FAST_MATH__' being defined). When (for
example) '-fno-reciprocal-math' is passed in addition to '-ffast-math', then
'-freciprocal-math' is no longer passed to cc1 (and the other five listed
above still are passed, along with '-ffast-math' still being passed). It
seems like the intention was that these individual switches were to enable
the individual floating-point transformations (and so the lack of any of
those switches would suppress the relevant transformations), but the
'-ffast-math' "umbrella" is over-riding the attempted suppression.

Sure, this looks like a bug, disable an individual fast-math flag on the command line should be possible and override a prior -ffast-math (usually the last one on the command line “wins”/override).

The Cfe-dev mailing list would be more appropriate to discuss the behavior of clang command line flags though.

The change proposed at https://reviews.llvm.org/D26708 deals with this issue

This patch seems to modify on LLVM, it does not deal at all with the issue you describe above.
I don’t see why the issue with the clang command line flags need to be dealt with at the LLVM level.

just for the reciprocal-transformation case, but it changes the semantics of
the 'fast' IR-level flag so that it no longer implies all the others.

The starting point for any change is: http://llvm.org/docs/LangRef.html#fast-math-flags
You would need to write a new definition for what “fast” would mean.

However I don’t need anything need to be changed here to address the use-case you want to fix.

With
that proposed approach, rather than an "umbrella" flag such as 'fast' being
checked in the back-end (along with an individual flag like 'arcp'), just
checking the individual flag ('arcp') would be done.

There is already no need to check the “fast” *and* arcp flag, if a transformation is about reciprocal, then you only need to check arcp (fast implies arcp, checking for fast would be redundant).

Be careful also that the fast-math flags are mainly an IR level definition, the backend only inherited these per instruction flag very recently. It has been entirely converted to use these, and it still uses a global flag in some places.
The line you’re touching in your patch for instance is about this legacy:

  if (!UnsafeMath && !Flags->hasAllowReciprocal())

The first flag is the global “fast-math” mode on the backend, which is not as fine grain as the per-instruction model.
The second flag is the “per instruction” flag, which is the model we aim at.

We should get rid of the “global” UnsafeMath in the backend, but that does not require any change to the IR or the individual fast-math flags.

Any fast-math-related
transformation that doesn't have an individual flag (e.g., re-association
currently doesn't), should eventually have an individual flag defined for
it, and then that individual flag should be checked.

What do people think?

I think these are valuable problems to solve, but you should tackle them piece by piece:

1) the clang part of overriding the individual FMF and emitting the right IR is the first thing to fix.
2) the backend is still using the global UnsafeFPMath and it should be killed.

Hope this makes sense.

Hi,

Thanks for the quick feedback. I see your points, but I have a few questions/comments. I’ll start at the end of the previous post:

I think these are valuable problems to solve, but you should tackle them piece by piece:

  1. the clang part of overriding the individual FMF and emitting the right IR is the first thing to fix.
  1. the backend is still using the global UnsafeFPMath and it should be killed.

I addressed this point (2) for the reciprocal aspect in the patch, but of course that wasn’t useful without doing something about (1).

Regarding (1), over at https://reviews.llvm.org/D26708#596610, David made the same point that it should be done in Clang. I can understand that, but I wonder whether having the concept of the ‘fast’ flag in the IR that implies all the other FMF makes sense? I’m not seeing a good reason for it, but since this is very new to me, I can easily imagine I’m missing the big picture.

For example, in the LLVM IR (http://llvm.org/docs/LangRef.html#fast-math-flags) the fast-math flags ‘nnan’, ‘ninf’, ‘nsz’, ‘arcp’ and 'fast’ are defined. Except for ‘fast’, each of these has a fairly specific definition of what they mean. For example, for ‘arcp’:

arcp => "Allow optimizations to use the reciprocal of an argument rather

than perform division."

‘fast’ is unusual, in that it describes a fairly generic set of aggressive floating-point optimizations:

fast => "Allow algebraically equivalent transformations that may dramatically

change results in floating point (e.g. reassociate). This flag implies

all the others."

Very loosely, ‘fast’ means “all the aggressive FP-transformations that are not controlled by one of the other 4, plus it implies all the other 4”. If for terminology, we call those additional aggressive optimizations ‘aggr’, then we have:

‘fast’ == ‘aggr’ + ‘nnan’ + ‘ninf’ + ‘nsz’ + ‘arcp’

So as I see it, if we want to disable only one of the other ones (like ‘arcp’, in my case), there isn’t any way to express that with these IR flags defined this way. In short, we cannot turn on all the flags besides ‘arcp’. To do that, what we want is that somehow for the Clang switches:

‘-ffast-math -fno-reciprocal-math’

to ultimately result in LLVM IR that has the following flags on in appropriate FP ops:

‘aggr’ + ‘nnan’ + ‘ninf’ + ‘nsz’

But I don’t see a way to express ‘aggr’ in the IR. We could do this, if we change the definition of the IR ‘fast’ flag to remove that sentence about implying all the others:

fast => "Allow algebraically equivalent transformations that may dramatically

change results in floating point (e.g. reassociate).

(If we do something like that, we may want to change the name from ‘fast’ to something else (like ‘aggr’), to avoid tying it too closely to the concept of the ‘-ffast-math’ switch.)

As an aside, I don’t know if the “reassociate” example is the only other transformation that’s allowed by ‘fast’ (I presume it isn’t), but I think reassociation would be better expressed by a separate flag, which could then be controlled independently via ‘-f[no]-associative-math’ switch. Not having that flag exist separately in the FMF is the origin of PR27372. But creating that flag and using it in the appropriate places would still run into these problems of ‘fast’ implying all the others, which would make it impossible to disable reassociation while leaving all the other FMF transformations enabled.

To ask a concrete question using the current definition of ‘fast’ (which includes enabling reassociation, as the LLVM IR documentation of FMF says), how can we express in the IR that reciprocal-transformations are not allowed, but reassociation is allowed?

So the bottom line is that I do see there are issues in Clang that are relevant. But as long as ‘fast’ means “‘aggr’ plus all the other FMF transformations”, I don’t see how we can effectively disable a subset of those other FMF transformations (while leaving ‘aggr’ transformations, such as reassociation, enabled). With that in mind, my patch took one step in having ‘fast’ no longer imply all the others.

Thanks,

-Warren

From: "Mehdi Amini via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Warren Ristow" <warren.ristow@sony.com>
Cc: llvm-dev@lists.llvm.org
Sent: Tuesday, November 15, 2016 11:10:48 PM
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags

Hi,

> Hi all,

> This is about https://reviews.llvm.org/D26708

> Currently when the command-line switch '-ffast-math' is specified,
> the

> IR-level fast-math-flag 'fast' gets attached to appropriate FP math

> instructions. That flag acts as an "umbrella" to implicitly turn on
> all the

> other fast-math-flags ('nnan', 'ninf', 'nsz' and 'arcp'):

> http://llvm.org/docs/LangRef.html#fast-math-flags

> This approach has the shortcoming that when there is a desire to
> disable one

> of these fast-math-flags, if the 'fast' flag remains, it implicitly

> re-enables the one being disabled. For example, compiling this
> test-case:

> extern void use(float x, float y);

> void test(float a, float b, float c) {

> float q1 = a / c;

> float q2 = b / c;

> use(q1, q2);

> }

> at '-O2 -ffast-math' does a reciprocal-transformation, so only one
> division

> is done (as desired with fast-math). Compiling it with:

> -O2 -ffast-math -fno-reciprocal-math

> should disable the reciprocal transformations (the flag 'arcp'),
> but
> leave

> all the other fast-math transformations enabled. The current
> implementation

> doesn't do that, since the 'fast' IR-level flag still gets set.

> Motivation of this discussion:
> https://llvm.org/bugs/show_bug.cgi?id=27372#c2

> As an aside, when '-ffast-math' is specified on the command-line,
> the

> following six switches are all passed to cc1:

> -menable-no-infs

> -menable-no-nans

> -fno-signed-zeros

> -freciprocal-math

> -fno-trapping-math

> -ffp-contract=fast

> and '-ffast-math' itself is also passed cc1 (the act of passing
> '-ffast-math'

> to cc1 results in the macro '__FAST_MATH__' being defined). When
> (for

> example) '-fno-reciprocal-math' is passed in addition to
> '-ffast-math', then

> '-freciprocal-math' is no longer passed to cc1 (and the other five
> listed

> above still are passed, along with '-ffast-math' still being
> passed).
> It

> seems like the intention was that these individual switches were to
> enable

> the individual floating-point transformations (and so the lack of
> any
> of

> those switches would suppress the relevant transformations), but
> the

> '-ffast-math' "umbrella" is over-riding the attempted suppression.

Sure, this looks like a bug, disable an individual fast-math flag on
the command line should be possible and override a prior -ffast-math
(usually the last one on the command line “wins”/override).

The Cfe-dev mailing list would be more appropriate to discuss the
behavior of clang command line flags though.

> The change proposed at https://reviews.llvm.org/D26708 deals with
> this issue

This patch seems to modify on LLVM, it does not deal at all with the
issue you describe above.
I don’t see why the issue with the clang command line flags need to
be dealt with at the LLVM level.

> just for the reciprocal-transformation case, but it changes the
> semantics of

> the 'fast' IR-level flag so that it no longer implies all the
> others.

The starting point for any change is:
http://llvm.org/docs/LangRef.html#fast-math-flags
You would need to write a new definition for what “fast” would mean.

However I don’t need anything need to be changed here to address the
use-case you want to fix.

I suspect that we want to start by getting rid of 'fast' on the IR level and replacing it with individual flags for the various optimization classes - Do we have only allowing reassociation and libm optimizations? Then we can readjust the Clang flags in a straightforward way.

-Hal

I know Hal and Mehdi are already aware, but for the wider audience and back reference, we recently had some discussions about the broken state of fast-math in:
https://reviews.llvm.org/D24816 ([Target] move reciprocal estimate settings from TargetOptions to TargetLowering)
https://reviews.llvm.org/D24815 ([clang] make reciprocal estimate codegen a function attribute)

Starting by fixing the IR flags is likely the best first step because if we fix the backend first, we may expose more bugs-on-top-of-bugs as described here:
https://llvm.org/bugs/show_bug.cgi?id=27372#c5

But we may need a bigger change than just removing the umbrella effect of ‘fast’:

When I started looking at how we could move the reciprocal-estimate setting (this is different than ‘arcp’) to instructions, I noted that the existing FMF uses Value’s SubclassOptionalData.

I’m guessing this is for maximum performance, but SubclassOptionalData is exactly 7 bits, and we’ve already burned 5 of them. So if we need more refinement than ‘fast’ / ‘aggr’ to express what’s possible via the front-end flags, we either need to expand SubclassOptionalData, add a custom field for FMF, or move FMF to metadata or attributes. I’m assuming the first 2 options are much larger undertakings than the third.

For reciprocal-estimates, using the existing FMF/SubclassOptionalData isn’t a viable solution because there’s no way to express all of the front-end info (number of refinement steps, etc) in 2 bits. So what about metadata? We already have an instruction-level metadata type called “MD_fpmath”. AFAICT, it’s currently only used to indicate a reduced accuracy operation for AMDGPU. So I was hoping to just add an option to that metadata type for reciprocal-estimates.

I realize that metadata is optional (ie, it can be dropped without affecting correctness). I think that’s fine for reciprocal-estimates, but I’m not sure. Are -ffast-math and friends optional optimization flags? Or do we consider those non-optional modes of execution? If they are not optional, then should we have instruction-level attributes for FMF? This possibility came up when we enabled FMF on calls ( https://reviews.llvm.org/D14707 ).

Apologies for raising too many issues at once, but consider that another potential subclass of relaxed-FP-math was recently discussed here:
https://reviews.llvm.org/D26602

Is a “distributive” transform different than an “associative” transform?

I still haven't sorted through all this, and I have to run, but as a general rule, can we define flags in a positive sense? E.g. NoInfsFPMath is currently defined in the comment as

     /// NoInfsFPMath - This flag is enabled when the
     /// -enable-no-infs-fp-math flag is specified on the command line. When
     /// this flag is off (the default), the code generator is not allowed to
     /// assume the FP arithmetic arguments and results are never +-Infs.

I find it easier to parse

     /// NoInfsFPMath - This flag is enabled when the
     /// -enable-no-infs-fp-math flag is specified on the command line. When
     /// this flag is on, the code generator may assume that FP
     /// arithmetic arguments and results are never +-Infs.

I realize that metadata is optional (ie, it can be dropped without
affecting correctness). I think that's fine for reciprocal-estimates,
but I'm not sure. Are -ffast-math and friends optional optimization
flags? Or do we consider those non-optional modes of execution? If they
are not optional, then should we have instruction-level attributes for
FMF? This possibility came up when we enabled FMF on calls (
https://reviews.llvm.org/D14707 ).

Well, for AMDGPU we'd prefer the metadata not be dropped, of course. But despite that I'd say they should be considered optional hints that give more freedom to the optimizer. Once you enable those flags, you're pretty much saying goodbye to reproducibility anyway, because different compilers will absolutely treat them differently.

Apologies for raising too many issues at once, but consider that another
potential subclass of relaxed-FP-math was recently discussed here:
https://reviews.llvm.org/D26602

Is a "distributive" transform different than an "associative" transform?

Probably not, but the name is misleading :slight_smile:

Cheers,
Nicolai

Make sense, I missed that we can’t *subtract* from fast at the IR level.

I wouldn’t be opposed to have something along the line of “aggr”, but there is a tradeoff: some transformation may be harder to guard with this model.

Maybe that could be a starting point: changing the “UnsafeAlgebra” bit in the FMF to be “aggr” you mention and replace all the query to FastMathFlags::UnsafeAlgebra() to return true if all the bits are set in the Flags. This alone should be nothing more than a mechanical change I believe.
The important part is then auditing all the users of UnsafeAlgebra() in the middle end and check if they can be “downgraded” to aggr safely: i.e. if they don’t need aggr *and* another flag.

Individual flags for various optimization classes make sense only if you don’t end up with a lot of very specialized new flags.
If a single “reassociate” flag could be enough to complete the existing and replace the “fast” that would be great.
But some auditing of all the users of “fast" would be needed first. For instance is "X * log2(0.5*Y) = X*log2(Y) - X” covered by “reassociation”? That seems a bit more than what people think about with reassociation at first.

So as I see it, if we want to disable only one of the other ones (like ‘arcp’, in my

case), there isn’t any way to express that with these IR flags defined this way. In

short, we cannot turn on all the flags besides ‘arcp’. To do that, what we want is

that somehow for the Clang switches:

‘-ffast-math -fno-reciprocal-math’

to ultimately result in LLVM IR that has the following flags on in appropriate FP ops:

‘aggr’ + ‘nnan’ + ‘ninf’ + ‘nsz’

Make sense, I missed that we can’t subtract from fast at the IR level.

I wouldn’t be opposed to have something along the line of “aggr”, but there is a tradeoff: some transformation may be harder to guard with this model.

Maybe that could be a starting point: changing the “UnsafeAlgebra” bit in the FMF to be “aggr” you mention and replace all the query to FastMathFlags::UnsafeAlgebra() to return true if all the bits are set in the Flags. This alone should be nothing more than a mechanical change I believe.

That sounds good. Thanks.

I just want to add that I view what Hal said as equivalent to my suggestion of removing the “umbrella” aspect, and renaming it so something like ‘aggr’. For reference here, Hal’s comments are:

I suspect that we want to start by getting rid of ‘fast’ on the IR level and replacing

it with individual flags for the various optimization classes - Do we have only

allowing reassociation and libm optimizations? Then we can readjust the Clang flags in

a straightforward way.

If I had suggested ‘reassoc_and_libm’ instead of ‘aggr’, it would be clearer that these are the same.

FTR, I didn’t give serious thought to the name ‘aggr’. I’m just trying to express the concept. Others may have a good suggestion for the name. My main point is that I want the flag that currently is called ‘fast’ to no longer imply the others. It seems like that’s a high level decision that the community needs to agree on, and I’m not sure if this thread is enough to give a green light to it. The approach you describe above of having FastMathFlags::UnsafeAlgebra() return true if all the bits are set sounds like the right way to go in doing this.

The important part is then auditing all the users of UnsafeAlgebra() in the middle end

and check if they can be “downgraded” to aggr safely: i.e. if they don’t need aggr

and another flag.

Yes, that makes perfect sense to me.

Thanks,

-Warren

From: "Mehdi Amini" <mehdi.amini@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: llvm-dev@lists.llvm.org, "Warren Ristow" <warren.ristow@sony.com>
Sent: Wednesday, November 16, 2016 11:03:48 AM
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags

> > From: "Mehdi Amini via llvm-dev" < llvm-dev@lists.llvm.org >
>

> > To: "Warren Ristow" < warren.ristow@sony.com >
>

> > Cc: llvm-dev@lists.llvm.org
>

> > Sent: Tuesday, November 15, 2016 11:10:48 PM
>

> > Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
> > 'fast' flag implying all fast-math-flags
>

> > Hi,
>

> >
>

> > > Hi all,
> >
>

> > > This is about https://reviews.llvm.org/D26708
> >
>

> > > Currently when the command-line switch '-ffast-math' is
> > > specified,
> > > the
> >
>

> > > IR-level fast-math-flag 'fast' gets attached to appropriate FP
> > > math
> >
>

> > > instructions. That flag acts as an "umbrella" to implicitly
> > > turn
> > > on
> > > all the
> >
>

> > > other fast-math-flags ('nnan', 'ninf', 'nsz' and 'arcp'):
> >
>

> > > http://llvm.org/docs/LangRef.html#fast-math-flags
> >
>

> > > This approach has the shortcoming that when there is a desire
> > > to
> > > disable one
> >
>

> > > of these fast-math-flags, if the 'fast' flag remains, it
> > > implicitly
> >
>

> > > re-enables the one being disabled. For example, compiling this
> > > test-case:
> >
>

> > > extern void use(float x, float y);
> >
>

> > > void test(float a, float b, float c) {
> >
>

> > > float q1 = a / c;
> >
>

> > > float q2 = b / c;
> >
>

> > > use(q1, q2);
> >
>

> > > }
> >
>

> > > at '-O2 -ffast-math' does a reciprocal-transformation, so only
> > > one
> > > division
> >
>

> > > is done (as desired with fast-math). Compiling it with:
> >
>

> > > -O2 -ffast-math -fno-reciprocal-math
> >
>

> > > should disable the reciprocal transformations (the flag
> > > 'arcp'),
> > > but
> > > leave
> >
>

> > > all the other fast-math transformations enabled. The current
> > > implementation
> >
>

> > > doesn't do that, since the 'fast' IR-level flag still gets set.
> >
>

> > > Motivation of this discussion:
> > > https://llvm.org/bugs/show_bug.cgi?id=27372#c2
> >
>

> > > As an aside, when '-ffast-math' is specified on the
> > > command-line,
> > > the
> >
>

> > > following six switches are all passed to cc1:
> >
>

> > > -menable-no-infs
> >
>

> > > -menable-no-nans
> >
>

> > > -fno-signed-zeros
> >
>

> > > -freciprocal-math
> >
>

> > > -fno-trapping-math
> >
>

> > > -ffp-contract=fast
> >
>

> > > and '-ffast-math' itself is also passed cc1 (the act of passing
> > > '-ffast-math'
> >
>

> > > to cc1 results in the macro '__FAST_MATH__' being defined).
> > > When
> > > (for
> >
>

> > > example) '-fno-reciprocal-math' is passed in addition to
> > > '-ffast-math', then
> >
>

> > > '-freciprocal-math' is no longer passed to cc1 (and the other
> > > five
> > > listed
> >
>

> > > above still are passed, along with '-ffast-math' still being
> > > passed).
> > > It
> >
>

> > > seems like the intention was that these individual switches
> > > were
> > > to
> > > enable
> >
>

> > > the individual floating-point transformations (and so the lack
> > > of
> > > any
> > > of
> >
>

> > > those switches would suppress the relevant transformations),
> > > but
> > > the
> >
>

> > > '-ffast-math' "umbrella" is over-riding the attempted
> > > suppression.
> >
>

> > Sure, this looks like a bug, disable an individual fast-math flag
> > on
> > the command line should be possible and override a prior
> > -ffast-math
> > (usually the last one on the command line “wins”/override).
>

> > The Cfe-dev mailing list would be more appropriate to discuss the
> > behavior of clang command line flags though.
>

> > > The change proposed at https://reviews.llvm.org/D26708 deals
> > > with
> > > this issue
> >
>

> > This patch seems to modify on LLVM, it does not deal at all with
> > the
> > issue you describe above.
>

> > I don’t see why the issue with the clang command line flags need
> > to
> > be dealt with at the LLVM level.
>

> > > just for the reciprocal-transformation case, but it changes the
> > > semantics of
> >
>

> > > the 'fast' IR-level flag so that it no longer implies all the
> > > others.
> >
>

> > The starting point for any change is:
> > http://llvm.org/docs/LangRef.html#fast-math-flags
>

> > You would need to write a new definition for what “fast” would
> > mean.
>

> > However I don’t need anything need to be changed here to address
> > the
> > use-case you want to fix.
>

> I suspect that we want to start by getting rid of 'fast' on the IR
> level and replacing it with individual flags for the various
> optimization classes - Do we have only allowing reassociation and
> libm optimizations? Then we can readjust the Clang flags in a
> straightforward way.

Individual flags for various optimization classes make sense only if
you don’t end up with a lot of very specialized new flags.
If a single “reassociate” flag could be enough to complete the
existing and replace the “fast” that would be great.
But some auditing of all the users of “fast" would be needed first.
For instance is "X * log2(0.5*Y) = X*log2(Y) - X” covered by
“reassociation”? That seems a bit more than what people think about
with reassociation at first.

I can only think of two flags we might need: One for reassociation and one for libm-function optimization. Your example with log2 might require both.

-Hal

Individual flags for various optimization classes make sense only if you
don’t end up with a lot of very specialized new flags.
If a single “reassociate” flag could be enough to complete the existing
and replace the “fast” that would be great.
But some auditing of all the users of “fast" would be needed first. For
instance is "X * log2(0.5*Y) = X*log2(Y) - X” covered by
“reassociation”? That seems a bit more than what people think about with
reassociation at first.

Why are there individual flags for different types of optimization _at all_?

The way I see it, if you started from a blank slate, then the flags should define what the application cares about. In other words, the following four flags would do:

nnan, ninf, nsz, inexact

The first three have the same meaning as today. The last one is "Allow algebraically equivalent transformations that may change the result due to different rounding only".

This includes what is today arcp, fp-contract=fast, and most of "unsafe math", but it *doesn't* by itself allow you to ignore inf or nan. I'm not sure about nsz; it seems like that could be implied by inexact.

I'd be really curious to know if there is anybody who really needs arcp without fp-contract=fast or vice versa, or who needs both of these but not the X*log2(0.5*Y) transform you mentioned, and so on.[1]

One could consider expressing inexact in terms of ulp, but since errors can accumulate through multiple transforms I doubt that that's too useful in general. (We do have a use case for ulp in AMDGPU since we want fast inexact reciprocals for some applications. We currently use metadata for that, and I see no need to change that.)

Cheers,
Nicolai

[1] One case I _can_ think of (and which may have been a reason for the proliferation of flags in the first place) is somebody who enables fast math, but then doesn't want their results to change when they update the compiler and get a new set of optimizations. But IMO that's a use case that should be explicitly rejected.

I don’t really like the idea of updating checks of UnsafeAlgebra() to depend on all of the other flags. It seems like it would be preferable to look at each optimization and figure out which flags it actually requires. I suspect that in many cases the “new” flag (i.e. allowing reassociation, etc.) will be what is actually needed anyway.

I would be inclined to agree with Niolai’s suggestion of combining all the flags related to value safety, except that Warren’s proposal that started this discussion seems to imply that he has a use case that requires reciprocals to be turned off separately.

-Andy

In my mind that’s mainly a way to avoid a “Big Bang” effect by staging the changes, allowing a much “NFC” changes as possible, then updating individual user of UnsafeAlgebra() to use the right set of flags, and killing this API when possible..

In my mind that’s mainly a way to avoid a “Big Bang” effect by staging the changes, allowing a much

“NFC” changes as possible, then updating individual user of UnsafeAlgebra() to use the right set of

flags, and killing this API when possible…

Yeah, I thought that was your motivation. Maybe I misunderstood your comment about auditing current users of UnsafeAlgebra(). I initially read it as implying that there might be some oddity in the current implementation that some code is relying on that we’d need to look for. Since the flag is distinct, the fact that it “implies” the others seems like a potential hazard so a problem like that wouldn’t surprise me.

If you were just suggesting that the changes to individual callers can be deferred, then I can live with that.

-Andy

Looking back at the discussion, I suggested this first to answer the original conceptual approach of having a flag that was “aggressive” to complement the existing individual flags with respect to “fast”.
If we want to go with “reassociate + libm” instead (which would be cleaner I think, even though “libm” needs to be clarified), it is more tricky: if we go progressively we may find at some point of the conversion a user for which a combination of reassociate and the existing flags is not enough.
But Hal seems confident that such users don’t exist.

… except that Warren’s proposal that started this discussion seems to imply that he

has a use case that requires reciprocals to be turned off separately.

Just to close this loose end, yes I have a use case.

Specifically, we have a customer that turns on ‘‑ffast‑math’, but was getting a runtime failure due to the reciprocal-transformation being done. They don’t want turn off fast‑math because they like the performance improvement, and can live with the imprecision in most cases. So they wanted to suppress just the reciprocal-transformation. I intended to tell them the solution was simple: use ‘‑ffast‑math ‑fno‑reciprocal‑math’. But on trying it myself, I ran into the issue here.

Thanks,

-Warren

Can you elaborate what kind of runtime failure is the reciprocal transformation triggering?

Can you elaborate what kind of runtime failure is the reciprocal transformation triggering?

Yes. It was along the lines of:

{

float x = a / c;

float y = b / c;

if (y == 1.0f) {

// do some processing for when ‘b’ and ‘c’ are equal

} else {

// do other processing

}

use(x, y);

}

Of course they understood they could easily change this code once they understood the issue.

But the fact that it “failed” for non-edge-case values of ‘c’, they were worried. As an example of the non-edge-case aspect, when ‘c’ is 41.0f (and so of course ‘b’ is 41.0f), intuitively they felt that this “would work precisely”, even with fast-math. Once they understood more, they agreed this was reasonable with fast-math, but they had the underlying concern that if they encountered one case where ‘num’ and ‘den’ were equal (and non-edge-case), yet ‘num / den’ wasn’t precisely 1.0f, then even if they fixed this situation where they encountered it, it might be lurking elsewhere in their code, and so they wanted to disable that transformation.

Thanks,

-Warren

Thanks for elaborating.

I’d be reluctant to call this situation a real use-case though.
Is the the distinction on reciprocal really make sense here? This user can have the same “surprising" anywhere in their code-base with reassociation as well:

void foo (float a, float b) {
  float x = a - b;
  if (x == 0)
     … // only if a == b
}

That would sound totally reasonable, unless foo is inlined and reassociation would lead to a non-zero value for x even when a and b passed in to foo "if it wasn’t inlined" would be identical!

(Reminds me somehow of a client that was bitten by nnan: their assumption was that as long as they didn’t introduce NaN in the program everything was fine. However with fast-math some transformations were introducing NaN where there wasn’t before and propagating to other computation that were transformed under the assumption that no NaN would show up, it also turns out that making the code safe against NaN and efficient at the same time is hard, especially when the code itself it compiled with fast-math).

Those are all good points. Your reassociation point in the context of inlining is particularly interesting.

FWIW, we also have a case where a customer wants ‘-fno-associative-math’ to suppress reassociation under ‘-ffastmath’. It would take me a while to find the specifics of the issue, but it was (if my memory is right) more of a real use-case. (That is to say, the code that was “failing” due to reassociation didn’t have an obvious fix like the reciprocal situation, here, other than to turn off fast-math.) In fact, the request to suppress reassociation was the motivation for creating PR27372 in the first place (which eventually fed into this thread). I have to say that on the reassociation point, my concern is that to really suppress that, we will have to suppress so much, that there will hardly be any point in using -ffast-math.

I’d say your comments here are very similar to what Nicolai said in another subthread of this discussion:

I’d be really curious to know if there is anybody who really needs arcp

without fp-contract=fast or vice versa, or who needs both of these but

not the Xlog2(0.5Y) transform you mentioned, and so on.[1]

[1] One case I can think of (and which may have been a reason for the

proliferation of flags in the first place) is somebody who enables fast

math, but then doesn’t want their results to change when they update the

compiler and get a new set of optimizations. But IMO that’s a use case

that should be explicitly rejected.

I think those are all really good points, and an argument can be made that when -ffast-math gives you results you don’t want, then you just have to turn it off. Essentially, the user can’t “have his cake and eat it too”.

All that said, I think we (the company I work for, Sony) will have to implement support for these switches. It comes down to GCC has these switches (e.g., -fno-reciprocal-math and -fno-associative-math), and they do suppress the transformations for our customers. They switch to Clang/LLVM, they use the same switches, and it doesn’t “work”. So as a practical matter, I think we will support them. Whether the LLVM community in general feels that that’s required, is another question. Until for your recent comments here, and Nicolai’s comments above, I would have thought the answer was clearly yes. But maybe that’s not the case.

In summary, irrespective of any (subjective?) assessment of how legitimate a particular use-case is, do we want switches like:

-ffast-math -fno-reciprocal-math

-ffast-math -fno-associative-math

to work?

For me, the answer is yes, because I have multiple customers that tell me they really want to leave -ffast-math on, but they want to be able to disable these sub-categories. I’ve been approaching this under the assumption that the answer is yes for the Clang/LLVM community in general.

Thanks,

-Warren