RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

nhaehnle · November 17, 2016, 9:31am

I feel your pain, but I'm not convinced yet that this is really the right approach.

It sounds like the customers (a) want fast-math in general but (b) have some specific parts of the code where it breaks things. What about having them disable fast-math on a more fine-grained scope, e.g. via something like an __attribute__(no_fast_math) function attribute at the C++ source level?

Then the problematic piece of code might be slower (since all of fast-math is disabled), but the rest of the code would likely be faster (since it benefits from all of fast-math instead of just a subset).

Cheers,
Nicolai

rotateright · November 17, 2016, 4:03pm

This is suggesting source code changes to customers that are switching
compilers, but (as Warren hinted at) one of the stated goals of clang is
GCC compatibility:
http://clang.llvm.org/

If that's still true, it means (barring anything that we explicitly
document and choose not to support), we should support GCC's FP options:

-ffp-contract=style
-ffast-math
-fno-math-errno
-funsafe-math-optimizations
-fassociative-math
-freciprocal-math
-ffinite-math-only
-fno-signed-zeros
-fno-trapping-math
-frounding-math
-fsignaling-nans
-fsingle-precision-constant

etc, and the relevant negations of these options. We can't predict how
customers will choose to chain these together, so I think the LLVM
optimizer and backend designs should accommodate all possibilities derived
from those clang flags. This includes (because I've seen this requested)
using relaxed FP modes and simultaneously enabling some subset of FP
exceptions. (I know it sounds crazy... )

mehdi_amini · November 17, 2016, 4:30pm

We don’t aim at being bug-to-bug compatible though.
I believe we are compatible in terms of command line invocation, even if some gcc flags are no-op in clang.

I am not convinced, because when disabling IEEE compliance we can’t even ensure that the result will be the same between two versions of clang (indeed it won’t in many/most real-world cases), the claim that we are “GCC compatible” has not much value here: the code can still break when built with clang and not when built with GCC, even when disabling fast-math.

mehdi_amini · November 17, 2016, 4:31pm

The last part should read “even when disabling reciprocal”

rotateright · November 17, 2016, 5:00pm

If we take this argument to its end: any one of those relaxed FP settings guarantees that we cannot ensure that the result will be the same between two versions of clang. Therefore, we can no-op all of them, and greatly simplify the optimizer.

I know that’s not what you’re advocating, but the suggestion that we remove ‘arcp’ is the first step on that path. We can’t do that. We must make a good faith effort to support these flags.

mehdi_amini · November 17, 2016, 5:10pm

If we take this argument to its end: any one of those relaxed FP settings guarantees that we cannot ensure that the result will be the same between two versions of clang. Therefore, we can no-op all of them, and greatly simplify the optimizer.

I don’t understand the logic here.

I know that’s not what you’re advocating, but the suggestion that we remove ‘arcp’ is the first step on that path. We can’t do that. We must make a good faith effort to support these flags.

I disagree, we can do it if don’t see any perceived value.
Saying “gcc has this option” does not mean we have to mimic its behavior if it does not make sense to us.

Note: I am not in favor of removing arcp, even though I don’t believe Warren’s use-case is really compelling.

mehdi_amini · November 17, 2016, 5:43pm

I understand now, and I think I answered below: yes we “can" no-op all of them, and no we don’t do it because they are valuable, because we find them useful, not because GCC expose them on its command line.

andykaylor · November 17, 2016, 6:54pm

All that said, I think we (the company I work for, Sony) will have to implement support

for these switches. It comes down to GCC has these switches (e.g., -fno-reciprocal-math

and -fno-associative-math), and they do suppress the transformations for our customers.

They switch to Clang/LLVM, they use the same switches, and it doesn’t “work”. So as a

practical matter, I think we will support them. Whether the LLVM community in general

feels that that’s required, is another question. Until for your recent comments here, and

Nicolai’s comments above, I would have thought the answer was clearly yes. But maybe

that’s not the case.

I think this is a very good point. You (Sony) are not the only ones who are concerned with GCC-command line compatibility. It definitely should hold some weight. Given that this is something we could do with just a little more effort, I’m not sure mere simplicity is enough reason not to do it.

Also, on a slight tangent…

I’d be really curious to know if there is anybody who really needs arcp

without fp-contract=fast or vice versa, or who needs both of these but

not the Xlog2(0.5Y) transform you mentioned, and so on.[1]

I just wanted to mention that fp-contract relates to things like FMA and shouldn’t be confused with fast-math.

-Andy

mehdi_amini · November 17, 2016, 7:38pm

Those are all good points. Your reassociation point in the context of inlining is particularly interesting.

FWIW, we also have a case where a customer wants '-fno-associative-math' to suppress reassociation under '-ffastmath'. It would take me a while to find the specifics of the issue, but it was (if my memory is right) more of a real use-case. (That is to say, the code that was "failing" due to reassociation didn't have an obvious fix like the reciprocal situation, here, other than to turn off fast-math.) In fact, the request to suppress reassociation was the motivation for creating PR27372 in the first place (which eventually fed into this thread). I have to say that on the reassociation point, my concern is that to really suppress that, we will have to suppress so much, that there will hardly be any point in using -ffast-math.

I'd say your comments here are very similar to what Nicolai said in another subthread of this discussion:

>> I'd be really curious to know if there is anybody who really needs arcp
>> without fp-contract=fast or vice versa, or who needs both of these but
>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
>> ...
>> [1] One case I _can_ think of (and which may have been a reason for the
>> proliferation of flags in the first place) is somebody who enables fast
>> math, but then doesn't want their results to change when they update the
>> compiler and get a new set of optimizations. But IMO that's a use case
>> that should be explicitly rejected.

I think those are all really good points, and an argument can be made that when -ffast-math gives you results you don't want, then you just have to turn it off. Essentially, the user can't "have his cake and eat it too".

All that said, I think we (the company I work for, Sony) will have to implement support for these switches. It comes down to GCC has these switches (e.g., -fno-reciprocal-math and -fno-associative-math), and they do suppress the transformations for our customers. They switch to Clang/LLVM, they use the same switches, and it doesn't "work". So as a practical matter, I think we will support them.

My point was that supporting these switch are not a guarantee for a fast-math user that his code will work, even the same command line flags is enough to make it work with GCC.
If you are providing these and saying that we are “compatible” with GCC to your users, in the sense that their code will continue to work, that seems incorrect to me.
What are you gonna answer them when they’ll use such flag but it won’t be enough for their code to work with clang even though it works with GCC? (Possibly because reassociation mess up another part of the code that GCC didn’t mess, because of different inlining decisions for instance).

  Whether the LLVM community in general feels that that's required, is another question. Until for your recent comments here, and Nicolai's comments above, I would have thought the answer was clearly yes. But maybe that's not the case.

In summary, irrespective of any (subjective?) assessment of how legitimate a particular use-case is, do we want switches like:

    -ffast-math -fno-reciprocal-math
     -ffast-math -fno-associative-math

to work?

For me, the answer is yes, because I have multiple customers that tell me they really want to leave -ffast-math on, but they want to be able to disable these sub-categories. I've been approaching this under the assumption that the answer is yes for the Clang/LLVM community in general.

The multiple customers may want a pony, we’re not gonna try to give them one just because they ask. I’d push back on such customer request for the reason I gave earlier.
If what they want does not make sense or we can’t provide the guarantee they really want, it is also our job to *not* provide them this and guide them toward an alternative model that is more controlled, understood, and solve the underlying problem they have.
As an example of “pony” request: I had a customer that wanted their floating-point “conformance test” to pass with fast-math: "float test_div(float a, float b) { return a/b; }” ; they didn’t see any reason why the compiler would do anything wrong on such a simple test (except that the HW didn’t have a division instruction…).

That being said, even though I’m not convinced by your “pony” use case, I don’t see any reason to not preserve the arcp flag in the IR at this point (Nicolai may disagree, let see his opinion), and it still make sense to me to try to change the *fast* flag to “reassociation” (or similar) in the IR (provided that we don’t find clients of the API that want “more” than reassociation + a combination of the other flags).

This should be enough to provide these command line switches at the clang level, and this should avoid you (Sony) to have to maintain any out-of-tree support for this.

Hope this clarify where I see the direction going, and even if you don’t agree with my reasoning, the conclusion should be satisfactory on your side

nhaehnle · November 17, 2016, 8:35pm

All that said, I think we (the company I work for, Sony) will have to implement support
for these switches. It comes down to GCC has these switches (e.g., -fno-reciprocal-math
and -fno-associative-math), and they do suppress the transformations for our customers.
They switch to Clang/LLVM, they use the same switches, and it doesn't "work". So as a
practical matter, I think we will support them. Whether the LLVM community in general
feels that that's required, is another question. Until for your recent comments here, and
Nicolai's comments above, I would have thought the answer was clearly yes. But maybe
that's not the case.

I think this is a very good point. You (Sony) are not the only ones who
are concerned with GCC-command line compatibility. It definitely should
hold some weight. Given that this is something we could do with just a
little more effort, I’m not sure mere simplicity is enough reason not to
do it.

Right. I'm not fundamentally opposed to having these flags, as long as we can agree that the *only* reason for having them is slightly better GCC compatibility. The "slightly better" is important, too, because promising real compatibility with any kind of fast math-type setting would simply be a lie.

So (to answer Mehdi's question in a different part of the thread), I'd consider keeping arcp around a wart, but an acceptable one. I'm fine with: IR 'fast' becomes IR 'reassociation' (or similar; algebraically correct transforms that may change rounding), and reciprocal math becomes "this thing that should logically be enabled by 'reassociation', but instead requires 'arcp' for GCC-'compatibility' reasons".

And to be clear, 'reassociation' should _not_ by itself allow transforms like X * (Y + 1) --> X * Y + X which can change the NaN-ness of the result when Infs are among the arguments. That's what 'reassociation' + 'ninf' is for.

Also, on a slight tangent...

I'd be really curious to know if there is anybody who really needs arcp
without fp-contract=fast or vice versa, or who needs both of these but
not the X*log2(0.5*Y) transform you mentioned, and so on.[1]

I just wanted to mention that fp-contract relates to things like FMA and
shouldn’t be confused with fast-math.

It's conceptually the same type of thing though, isn't it? At least fp-contract=fast, which means "use FMA even when it changes floating point results (due to different rounding)". This is kind of like the 'fast' flag, which means "do all sorts of transformations even when they change floating point results (due to different rounding)". I don't know whether clang -ffast-math enables fp-contract=fast, but I'd say that in a clean, from-scratch design, fp-contract=fast shouldn't be a separate flag.

Cheers,
Nicolai

Finkel_Hal_J · November 17, 2016, 9:03pm

From: "Mehdi Amini via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Mehdi Amini" <mehdi.amini@apple.com>
Cc: llvm-dev@lists.llvm.org, "cfe-dev" <cfe-dev@lists.llvm.org>
Sent: Thursday, November 17, 2016 11:43:33 AM
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of
'fast' flag implying all fast-math-flags

>

> > If we take this argument to its end: any one of those relaxed FP
> > settings *guarantees* that we cannot ensure that the result will
> > be
> > the same between two versions of clang. Therefore, we can no-op
> > all
> > of them, and greatly simplify the optimizer.
>

> I don’t understand the logic here.

I understand now, and I think I answered below: yes we “can" no-op
all of them, and no we don’t do it because they are valuable,
because we find them useful, not because GCC expose them on its
command line.

I think this is exactly right. We should support these flags because they're useful to our users (which they are). We should support reasonable subsetting of fast-math because that's useful to our users (which it is to the extent that we currently support and will be more useful once we can separately toggle reassociation, etc.).

-Hal

Finkel_Hal_J · November 17, 2016, 9:09pm

From: "Nicolai Hähnle via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Andrew Kaylor" <andrew.kaylor@intel.com>, "Warren Ristow" <warren.ristow@sony.com>, "mehdi amini"
<mehdi.amini@apple.com>
Cc: llvm-dev@lists.llvm.org
Sent: Thursday, November 17, 2016 2:35:47 PM
Subject: Re: [llvm-dev] RFC: Consider changing the semantics of 'fast' flag implying all fast-math-flags

>>All that said, I think we (the company I work for, Sony) will have
>>to implement support
>>for these switches. It comes down to GCC has these switches (e.g.,
>>-fno-reciprocal-math
>>and -fno-associative-math), and they do suppress the
>>transformations for our customers.
>>They switch to Clang/LLVM, they use the same switches, and it
>>doesn't "work". So as a
>>practical matter, I think we will support them. Whether the LLVM
>>community in general
>>feels that that's required, is another question. Until for your
>>recent comments here, and
>>Nicolai's comments above, I would have thought the answer was
>>clearly yes. But maybe
>>that's not the case.
>
> I think this is a very good point. You (Sony) are not the only
> ones who
> are concerned with GCC-command line compatibility. It definitely
> should
> hold some weight. Given that this is something we could do with
> just a
> little more effort, I’m not sure mere simplicity is enough reason
> not to
> do it.

Right. I'm not fundamentally opposed to having these flags, as long
as
we can agree that the *only* reason for having them is slightly
better
GCC compatibility. The "slightly better" is important, too, because
promising real compatibility with any kind of fast math-type setting
would simply be a lie.

So (to answer Mehdi's question in a different part of the thread),
I'd
consider keeping arcp around a wart, but an acceptable one. I'm fine
with: IR 'fast' becomes IR 'reassociation' (or similar; algebraically
correct transforms that may change rounding), and reciprocal math
becomes "this thing that should logically be enabled by
'reassociation',
but instead requires 'arcp' for GCC-'compatibility' reasons".

And to be clear, 'reassociation' should _not_ by itself allow
transforms
like X * (Y + 1) --> X * Y + X which can change the NaN-ness of the
result when Infs are among the arguments. That's what 'reassociation'
+
'ninf' is for.

> Also, on a slight tangent...
>
>
>
>>> I'd be really curious to know if there is anybody who really
>>> needs arcp
>>> without fp-contract=fast or vice versa, or who needs both of
>>> these but
>>> not the X*log2(0.5*Y) transform you mentioned, and so on.[1]
>
> I just wanted to mention that fp-contract relates to things like
> FMA and
> shouldn’t be confused with fast-math.

It's conceptually the same type of thing though, isn't it? At least
fp-contract=fast, which means "use FMA even when it changes floating
point results (due to different rounding)".

Yes and no. For one thing, *all* FP contraction modes (i.e. FMA formation), including the one standardized by C, can change results doing to different intermediate rounding properties. What makes it different from other "fast math" settings is that:

1. The result from individual contracted operations is always more accurate than the original operations, not less.
2. The FMA operation, which is the only combination that FP contraction enables, is a specific combination that is defined by the IEEE standard.

For these reasons, we differentiate it from the others, and users also consider it qualitatively different from other FP optimization flags.

-Hal

wristow · November 17, 2016, 9:24pm

On the plus side, I'm glad to see the conclusions of the last couple of posts.

From Mehdi:

Hope this clarify where I see the direction going, and even if you don’t agree with my
reasoning, the conclusion should be satisfactory on your side

I'd say that summarizes my thoughts on this well.

And from Nicolai:

Right. I'm not fundamentally opposed to having these flags, ...

I do agree with much of what you both say, but definitely not all of it. The philosophy of not providing what a customer requests and instead guiding them to a better alternative is something I agree with -- we don't just give them a pony. And I agree *strongly* that just because a program gets the answer a user wants with GCC (using fast-math) and we get an answer they view as "wrong", doesn't mean it's a bug of ours and that we need to change to get the same answer as GCC. That's not what our goal of GCC compatibility means to me.

But we do have a switch '-fno-reciprocal-math' that we accept, and even process/implement to some extent. But that implementation has a bug. Fixing that bug so that when a user says '-ffast-math -fno-reciprocal-math', we enable the fast-math transformations but explicitly disable the reciprocal transformations is, in my view, the right thing to do. Simply, that is a bug that we ought to fix -- unless we agree to abandon support of '-fno-reciprocal-math', which I think isn't under consideration at this stage. And FTR, I'd oppose that, not surprisingly.

I'm not at all trying to justify the "pony" use-case from this customer, but if we provide '-fno-reciprocal-math', I think we ought to fix bugs we find in our implementation of it. Fixing that bug doesn't guarantee we'll then get the same answers as GCC does on every program when compiled with '-ffast-math -fno-reciprocal-math', but IMO that isn't required for us to describe our behavior as "GCC compatibility" in this respect.

Fast-math is "unusual", in that the user is explicitly opening the door to allowing us to do non-compliant transformations. As compared with GCC, our implementation can have a subset or a superset of these non-compliant transformations, and we can still call that "GCC compatibility". As an analogous "not unusual" feature, both we and GCC do type based alias analysis. It's a perfectly standard-compliant thing to do optimizations based on conclusions from the tbaa. We both support the switch '-f[no-]strict-aliasing' to control this (and we both enable it by default). Referring to this as "GCC compatibility" is perfectly legitimate, in my view. But if a user program has an aliasing bug in it, and our tbaa directs us to aggressively optimize it, whereas GCC's doesn't (and so the user gets the answer they wanted with GCC, but not with us), this does not mean we have a bug, or that saying we're GCC compatible in terms of '-f[no-]strict-aliasing' is a "lie". We can do a superset or subset of the optimizations that GCC does in terms of alias analysis, and we can quite reasonably describe us a GCC compatible in terms of us providing this capability. A user insisting we have a bug in this tbaa situation is analogous to your "pony" request about "float test_div(float a, float b) { return a/b; }". And (unrelated to Clang/LLVM) I've had this sort of objection from users in tbaa situations in the past, where I've had to defend my point that just because GCC didn't optimize it as aggressively as the compiler I was providing, it wasn't a bug in our compiler. So I'm all for not giving everyone a pony.

But irrespective of how silly a test-case it may be to do:

    {
      float x = a / c;
      float y = b / c;

      if (y == 1.0f) {
        // do some processing for when 'b' and 'c' are equal
      } else {
        // do other processing
      }

use(x, y);
}

I cannot in good conscience tell the customer that it's OK for us to do:

      float tmp = 1.0f / c;
      float x = a * tmp;
      float y = b * tmp;

when they specified '-ffast-math -fno-reciprocal-math'. They can rightfully come back and say "what do you mean by '-fno-reciprocal-math'?" I have to call that a compiler-bug.

Thanks!
-Warren

mehdi_amini · November 17, 2016, 9:44pm

I agree with all you wrote above
But I’d add that a legitimate fix could be for the clang driver to issue an error (or a warning) saying “-fno-reciprocal-math” isn’t compatible with -ffast-math, disabling -fxxxxxx” (with xxxxx being one or the other ;)).

mehdi_amini · November 17, 2016, 10:02pm

I don’t want to add confusion, I feel I’m doing a bad job here somehow: I’m not saying we should do this (rejecting in the driver). So let’s just fix it!

wristow · November 17, 2016, 10:24pm

Thanks for all that. I think we’re more in agreement here than it may have appeared initially.

So let’s just fix it!

Sounds good!

I have some other things on my plate at the moment, so I doubt I’ll get to working on this until after Thanksgiving (I don’t won’t my lack of activity to be interpreted as a loss of interest on my part to get this done).

Before work can be done to fix it, the details of precisely what changes we want to make in the fast-math-flags IR needs to be decided. There has been some discussion in this thread on that point (‘aggr’, ‘reassoc’ + ‘libm’, something else?), but no clear spec. I’d be happy to propose something concrete, and I’d fully expect that it would evolve a bit after feedback. I’m also happy for others to propose specifics. In any case, I won’t work on taking this further until sometime after Thanksgiving.

-Warren

nhaehnle · November 17, 2016, 10:27pm

That makes sense, thank you for the clarification!

Nicolai

Kreitzer_David_L · November 18, 2016, 9:39pm

I just read through this thread, and I did not see a good definition of what

exactly “fast + no-arcp” would mean. Clearly (1) would be disallowed, but what

about the others?

(1) X / Y → X * (1.0 / Y)

(2) (X * Y) / Z → (X / Z) * Y

(3) (X / Z) * Y → X * (Y / Z)

(4) (X / Y) / Z → X / (Y * Z)

etc.

It is easy to write a unit test for each of (1)-(4) showing that gcc6.2 will

apply the transform under “-ffast-math” but not under

“-ffast-math -fno-reciprocal-math”. (It is also easy to write unit tests where

gcc6.2 will perform these transforms in spite of -fno-reciprocal-math, but I

assume those would be considered bugs.)

I trust the intent is to update the language reference such that it is easy to

reason about the correctness of these and other division-related transforms?

Thanks,

Dave

Finkel_Hal_J · November 18, 2016, 10:34pm