[RFC] FP Contract = fast?

Folks,

I've been asking around people about the state of FP contract, which
seems to be "on" but it's not really behaving like it, at least not as
I would expect:

int foo(float a, float b, float c) { return a*b+c; }

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=on -o -
(...)
fmul s0, s0, s1
fadd s0, s0, s2
(...)

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=fast -o -
(...)
fmadd s0, s0, s1, s2
(...)

I'm not sure this works in Fortran either, but defaulting to "on" when
(I believe) the language should allow contraction and not doing it is
not a good default.

i haven't worked out what would be necessary to make it work on a
case-by-case basis (what kinds of fusions does C allow?) to make sure
we don't do all or nothing, but if we don't want to start that
conversation now, then I'd recommend we just turn it all the way to 11
(like GCC) and let people turn it off if they really mean it.

The rationale is that:

* Contracted operations increase precision (less rounding steps)
* It performs equal or faster on all architectures I know (true everywhere?)
* Users already expect that (certainly, GCC users do)
* Makes us look good on benchmarks :slight_smile:

A recent SPEC2k6 comparison Linaro did for AArch64, enabling
-ffp-contract=fast took the edge of GCC in a number of cases and in
some of them made them comparable in performance. So, any reasons not
to?

If we go with it, we need to first finish the job that Sebastian was
dong on the test-suite, then just turn it on by default. A second
stage would be to add tests/benchmarks that explicitly test FP
precision, so that we have some extra guarantee that we're doing the
right thing.

Opinions?

cheers,
--renato

Folks,

I've been asking around people about the state of FP contract, which
seems to be "on" but it's not really behaving like it, at least not as
I would expect:

int foo(float a, float b, float c) { return a*b+c; }

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=on -o -
(...)
fmul s0, s0, s1
fadd s0, s0, s2
(...)

When you reverted r282259 in 282289, you also reverted the functional fix to make the command-line option actually work. Right now it is broken. Regardless of what else we do, we should fix this (we should probably recommit r282259, with the default flipped, to pick up the fixes). If you were to change your source file to:

#pragma STDC FP_CONTRACT ON
int foo(float a, float b, float c) { return a*b+c; }

Then running:

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=on -o -
(...)
fmadd s0, s0, s1, s2
(...)

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=fast -o -
(...)
fmadd s0, s0, s1, s2
(...)

I'm not sure this works in Fortran either, but defaulting to "on" when
(I believe) the language should allow contraction and not doing it is
not a good default.

i haven't worked out what would be necessary to make it work on a
case-by-case basis (what kinds of fusions does C allow?) to make sure
we don't do all or nothing, but if we don't want to start that
conversation now, then I'd recommend we just turn it all the way to 11
(like GCC) and let people turn it off if they really mean it.

The rationale is that:

* Contracted operations increase precision (less rounding steps)
* It performs equal or faster on all architectures I know (true everywhere?)
* Users already expect that (certainly, GCC users do)
* Makes us look good on benchmarks :slight_smile:

A recent SPEC2k6 comparison Linaro did for AArch64, enabling
-ffp-contract=fast took the edge of GCC in a number of cases and in
some of them made them comparable in performance. So, any reasons not
to?

If we go with it, we need to first finish the job that Sebastian was
dong on the test-suite, then just turn it on by default. A second
stage would be to add tests/benchmarks that explicitly test FP
precision, so that we have some extra guarantee that we're doing the
right thing.

Opinions?

I'm certainly in favor of this plan. My users generally find our current defaults confusing because it differs from all of our other compilers (GCC and vendor compilers), plus it gives poor performance.

Thanks again,
Hal

Folks,

I've been asking around people about the state of FP contract, which
seems to be "on" but it's not really behaving like it, at least not as
I would expect:

int foo(float a, float b, float c) { return a*b+c; }

$ clang -target aarch64-linux-gnu -O2 -S fma.c -ffp-contract=on -o -
(...)
fmul s0, s0, s1
fadd s0, s0, s2
(...)

When you reverted r282259 in 282289, you also reverted the functional fix to make the command-line option actually work. Right now it is broken. Regardless of what else we do, we should fix this (we should probably recommit r282259, with the default flipped, to pick up the fixes). If you were to change your source file to:

#pragma STDC FP_CONTRACT ON
int foo(float a, float b, float c) { return a*b+c; }

I would like to see https://reviews.llvm.org/rL282259 re-enabled,
and the few miscompares left in the aarch64 run of the test-suite
fixed by adding the FP_CONTRACT pragma in the source code.

The commit log of r282259 states the problem that it fixed:

Clang has the default FP contraction setting of “-ffp-contract=on”, which
doesn't really mean “on” in the conventional sense of the word, but rather
really means “according to the per-statement effective value of the relevant
pragma”.

Thanks,
Sebastian

When you reverted r282259 in 282289, you also reverted the functional fix
to make the command-line option actually work. Right now it is broken.
Regardless of what else we do, we should fix this (we should probably
recommit r282259, with the default flipped, to pick up the fixes).

We should run check-all and the test-suite with it on/fast before
flipping the switch, and make sure that the behaviour Sebastian
encountered is dealt with before this going live, or we'd be breaking
too many test-suites and reverting and reapplying too often.

But I'm certainly in favour of the plan to make it on/fast by default.

If you were to change your source file to:

#pragma STDC FP_CONTRACT ON
int foo(float a, float b, float c) { return a*b+c; }

I wasn't aware you needed the pragma for -ffp-contract=on. I assumed
it would enable on all fp-math that the standard allowed (thus maybe
needing some annotation on the operations).

I thought that the pragma was to avoid using the command line argument...

I'm certainly in favor of this plan. My users generally find our current
defaults confusing because it differs from all of our other compilers (GCC
and vendor compilers), plus it gives poor performance.

Right, I also agree to follow the principle of least surprise, we
should default to "fast". Let's just make sure the infrastructure
isn't going to crumble and do it.

cheers,
--renato

Great! Do you have a list of the remaining cases?

cheers,
--renato

We should default to pragma STDC FP_CONTRACT ON, not ‘fast’.

‘on’ is the most aggressive mode that conforms to C11. ‘fast’ should be opt-in (like ‘fast-math’ is).

– Steve

Why? Other compilers default to ‘fast’, and ‘on’ does not play well with C++ code (where inlining is really important, and so the statement boundaries that restrict contraction encourage bad coding style). -Hal

When you reverted r282259 in 282289, you also reverted the functional fix
to make the command-line option actually work. Right now it is broken.
Regardless of what else we do, we should fix this (we should probably
recommit r282259, with the default flipped, to pick up the fixes).

We should run check-all and the test-suite with it on/fast before
flipping the switch, and make sure that the behaviour Sebastian
encountered is dealt with before this going live, or we'd be breaking
too many test-suites and reverting and reapplying too often.

But I'm certainly in favour of the plan to make it on/fast by default.

If you were to change your source file to:

#pragma STDC FP_CONTRACT ON
int foo(float a, float b, float c) { return a*b+c; }

I wasn't aware you needed the pragma for -ffp-contract=on. I assumed
it would enable on all fp-math that the standard allowed (thus maybe
needing some annotation on the operations).

I thought that the pragma was to avoid using the command line argument...

You shouldn't; this is a bug.

  -Hal

I’m fine with defaulting to fast in C++ (it’s allowed by the C++ arithmetic model IIRC).

Personally, I think the bar for not defaulting to the standards-conforming mode should be high. In particular, ‘fast’ doesn’t respect the pragma control, so it is unsafe—if someone copy-pastes code that uses the documented C semantics into a file compiled with ‘fast’ set, they will get incorrect behavior, even if they use the pragmas.

“Other compilers are doing it” is not convincing, at all, especially where hard-to-analyze floating-point rounding behavior is concerned. GCC defaulted to preserving extra precision on x87 for decades, and that wasn’t a good idea either. We can do better.

– Steve

Oh, good point. We can and should fix this now. All relevant IR instructions now support fast-math flags (arithmetic, function calls, etc.). We should have a flag to allow contractions and use it. This will allow the pragmas to work correctly. In general, I agree. I think, however, that in this case the other compilers are right. The statement-limited model is something that, in my experience, users find surprising and unhelpful. Nevertheless, I’m fine with having different defaults here for C and C++ (although somewhat sad) is we want to draw the line at having a confirming implementation by default. -Hal

Relevant to this discussion is http://bugs.llvm.org/show_bug.cgi?id=25721 (-ffp-contract=fast does not work with LTO). I am working on adding function attributes for fp-contract=fast which should fix this.

Also now that we have backend optimization remarks, I am planning to report missed optimization when we can’t fuse FMAs due “fast” not being on. This will show up in the opt-viewer. Then the user can opt in either with the command-line switch or the new function attribute.

Adam

Great! That seems useful. Thanks again, Hal

A function attribute would be a strict improvement over today: LLVM can’t do contraction today. But actually I’m not sure if it is the long term right choice: attributes don’t combine well with inlining for instance. You mentioned FMF earlier, why don’t we have a FMF to allow contraction?

Also, IIUC, the function attribute as well as a FMF wouldn’t apply to the “ON” setting but only to the “FAST” mode (no way to distinguish source level statement in llvm IR).

A function attribute would be a strict improvement over today: LLVM can’t do contraction today. But actually I’m not sure if it is the long term right choice: attributes don’t combine well with inlining for instance. You mentioned FMF earlier, why don’t we have a FMF to allow contraction?

OK, I thought that the prerequisite for that was a fast-math pragma which I don’t think is something we have (I want to be able to specify contract=fast on smaller granularity than module). But now that I think more about, we should be able to turn a user function attribute into FMF in the front-end which is the most flexible.

Also, IIUC, the function attribute as well as a FMF wouldn’t apply to the “ON” setting but only to the “FAST” mode (no way to distinguish source level statement in llvm IR).

Yes.

Adam

I agree, a FMF is the way to go and we can then control it with the pragma. We can use the STDC FP_CONTRACT pragma for contraction; I also think that having a “fast math” pragma is also a good idea (the fact that we can currently only specify fast-math settings on a translation-unit level is somewhat problematic). Right. We still have the existing fmuladd intrinsic method for dealing with the “ON” setting.

I agree, a FMF is the way to go and we can then control it with the pragma. We can use the STDC FP_CONTRACT pragma for contraction;

Just to confirm, do you mean to introduce a “fast” option to the pragma, e.g.:

#pragma STDC FP_CONTRACT FAST

Thanks,
Adam

That’s a good point. If we don’t add something like this, then we’d be able to turn the fast mode off with the pragma, but then not be able to turn it back on :wink: So, yes, except that I’m somewhat hesitant to invade the ‘STDC’ space with vendor extensions. If we generally introduce a pragma to control FMFs, maybe we should just use that instead? I don’t have a clear idea on the syntax, but for example, if we had some pragma that let us do #pragma clang fast_math or #pragma clang fast_math nnan(off) contract(on) or whatever then we could use that. What do you think? -Hal

That’s a good point. If we don’t add something like this, then we’d be able to turn the fast mode off with the pragma, but then not be able to turn it back on :wink: So, yes, except that I’m somewhat hesitant to invade the ‘STDC’ space with vendor extensions. If we generally introduce a pragma to control FMFs, maybe we should just use that instead? I don’t have a clear idea on the syntax, but for example, if we had some pragma that let us do #pragma clang fast_math or #pragma clang fast_math nnan(off) contract(on) or whatever then we could use that. What do you think?

That looks great; it nicely matches the internal representation. Let me take a stab at this.

Adam

That’s a good point. If we don’t add something like this, then we’d be able to turn the fast mode off with the pragma, but then not be able to turn it back on ;)So, yes, except that I’m somewhat hesitant to invade the ‘STDC’ space with vendor extensions. If we generally introduce a pragma to control FMFs, maybe we should just use that instead? I don’t have a clear idea on the syntax, but for example, if we had some pragma that let us do#pragma clang fast_math or #pragma clang fast_math nnan(off) contract(on) or whatever then we could use that. What do you think?

That looks great; it nicely matches the internal representation. Let me take a stab at this.

Thinking more about this, I am back to thinking that a function attribute is the better solution for this than FMF, at least before inlining.

Consider the standard example where we want this to trigger: an overloaded addition and multiplier operator for a vector class.

In this case, we want the fadd and the fmul in the inlined functions to have the FMF as well but we don’t necessarily want to mark the overloaded operators with the pragma; we may be only comfortable contracting at this call site.

You don’t have this problem if you mark the containing function FP-contractable. Effectively what we want is to outline the block within the pragma into a function and tag it with the attribute.

During inlining we can still transform the function attribute into FMF.

So I think I am going back to implementing fp-contract=fast as a function attribute as the first step unless there are any objections.

Adam

Are you saying this works because we don’t block inlining when functions attributes don’t match or update the function attributes to be more conservative when inlining? This is specifically one of the issues we were avoiding by using FMFs. Frankly, the same issue comes up with other fast-math properties, and I don’t see why we should handle this differently. I think that I’d prefer you stick with the new flag. -Hal