From: "Stephen Canon via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Ana Pazos" <apazos@codeaurora.org>
Cc: cfe-dev@lists.llvm.org
Sent: Saturday, September 19, 2015 3:00:53 PM
Subject: Re: [cfe-dev] question about fused multiply add and Clang
GNU modes
Hi Ana,
It would change the behavior of a lot of existing software in subtle
ways to let -std=gnu11 license fp-contract=fast. I’m honestly rather
surprised that GCC made that choice.
I’m not sure what you mean by "We know the instruction produces
results with higher precision and compliant to IEEE 754 standard.”
FMA produces *different* results than FMUL + FADD, but they are not
always more accurate. The classical example of naive FMA formation
gone wrong is multiplying a complex number by its conjugate. The
imaginary part *should* be zero, but when FMA formation is licensed,
one generally gets a small non-zero imaginary part.
IEEE doesn’t actually license fma formation. I’m not sure where you
got the idea that it does. It doesn’t expressly forbid it either.
Rather it makes the following recommendations:
"A language standard should require that by default, when no
optimizations are enabled and no alternate exception handling is
enabled, language implementations preserve the literal meaning of
the source code.”
This means that by--default--an implementation should not transform
FMUL + FADD into FMADD. It encourages this transform to be available
as an option, however:
"A language standard should also define, and require implementations
to provide, attributes that allow and disallow value-changing
optimizations, separately or collectively, for a block. These
optimizations might include, but are not limited to:
― Applying the associative or distributive laws.
― Synthesis of a fusedMultiplyAdd operation from a multiplication and
an addition.
― Synthesis of a formatOf operation from an operation and a
conversion of the result of the operation.
― Use of wider intermediate results in expression evaluation."
Note that the other transforms that IEEE-754 groups in with FMA
formation here are all things that we license only under fast-math.
Now, it so happens that fma formation makes results more accurate
more often than it makes them less accurate. It is *usually* a good
thing, so the case isn’t quite a cut and dry as I’m presenting it to
be. It’s also quite beneficial for performance on many platforms
(but rather detrimental to performance on some other platforms with
hardware FMA support, so again the case is not terribly clear).
It should also be noted that -ffp-contract=fast goes beyond what is
allowed by the C rules for #pragma STDC FP_CONTRACT ON (which allows
fma formation only within an expression):
[... ]
Now, it *does* appear to me that we do not default to having STDC
FP_CONTRACT ON, which is inhibiting fma formation *even within an
expression*. Given that we support STDC FP_CONTRACT OFF, we could
certainly choose to make ON the default, and I would encourage doing
so.
I agree. Also, our behavior here is appears somewhat buggy. Not only do we not set -ffp-contract=on by default (as I recall had been our intention), but -ffp-contract=on does not even work correctly. The code in lib/Frontend/CompilerInvocation.cpp does call Opts.setFPContractMode(CodeGenOptions::FPC_On) when passed -ffp-contract=on, but only in OpenCL mode do we set Opts.DefaultFPContract = 1. Setting CodeGenOptions::FPC_On does pass the right flag to to the backend, and does enable generating @llvm.fmuladd when an operation is tagged as 'FPContractable', but...
1. The STDC FP_CONTRACT pragma's DEFAULT option always resets to getLangOpts().DefaultFPContract, and thus is unaffected by the -ffp-contract flag (because that's always 0 except in OpenCL mode).
2. FPFeatures.fp_contract is initialized to 0 in include/clang/Basic/LangOptions.h, and this is never changed (except by the STDC FP_CONTRACT pragma handlers). When we create BinaryOperator AST nodes (etc.) we use the current state of FPFeatures.fp_contract to set the node's FPContractable flag, and because this always defaults to 0, regardless of how -ffp-contract is set (except setting it to fast which bypasses all of this), none of the AST nodes are marked as contractible, and we don't generate FMAs at all.
I think that the first step here is fixing all of this so that -ffp-contract=on actually works.
-Hal
– Steve
Hi folks,
GNU GCC allows fused multiply add instruction generation in –std=gnu*
modes (default mode in GCC) on both ARM 32-bit and 64-bit targets.
See outputs below.
Clang 3.8 defaults to gnu11 for C programs, according to
http://clang.llvm.org/docs/UsersManual.html#c-language-features and
function CompilerInvocation::setLangDefaults in
./lib/Frontend/CompilerInvocation.cpp in the Clang source code.
So why fp-contract=fast is not made default in Clang as it is done in
GNU GCC?
Just trying to understand the rationale behind this decision. We know
the instruction produces results with higher precision and compliant
to IEEE 754 standard.
This difference in default behavior in Clang/LLVM compared to GNU GCC
is a performance disadvantage.
Thanks!
Ana.
$ cat t.c
double f(double a, double b)
{
return b*b+a*a;
}
$
gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
-S -O3 -o- -std=c99 t.c
.cpu generic+fp+simd
.file "t.c"
.text
.align 2
.global f
.type f, %function
f:
fmul d1, d1, d1
fmul d0, d0, d0
fadd d0, d1, d0
ret
.size f, .-f
.ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
.section .note.GNU-stack,"",%progbits
$
gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
-S -O3 -o- -std=gnu99 t.c
.cpu generic+fp+simd
.file "t.c"
.text
.align 2
.global f
.type f, %function
f:
fmul d0, d0, d0
fmadd d0, d1, d1, d0
ret
.size f, .-f
.ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
.section .note.GNU-stack,"",%progbits
$
gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
-S -O3 -o- t.c
.cpu generic+fp+simd
.file "t.c"
.text
.align 2
.global f
.type f, %function
f:
fmul d0, d0, d0
fmadd d0, d1, d1, d0
ret
.size f, .-f
.ident "GCC: (Linaro GCC 4.9-2015.05) 4.9.3 20150413 (prerelease)"
.section .note.GNU-stack,"",%progbits
Ana Pazos
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project.
_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory