[test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

Hi,

I would like to provide a summary of the different proposals on how to
fix the test-suite to make it succeed when specifying extra CFLAGS
"-Ofast" and "-ffp-contract=on". I would like to expose the issue and
proposed ways to fix it to other potential reviewers that could
provide extra feedback. We also need to decide which proposal (or
combination of) to implement and commit.

Proposal 1: https://reviews.llvm.org/D25277
modify the CMakes to compile and run each of these benchmarks twice:
once with added CFLAGS -ffp-contract=off. Record on disk the full
output of both runs and compare with FP_TOLERANCE. Hash the output of
the run with -ffp-contract=off and exact match against the reference
output.

The good for Proposal 1:
- changes contained in the build system: no change to the code of the benchmarks
- runs benchmarks under an extra configuration with CFLAGS += -ffp-contract=off

The bad for Proposal 1:
- compilation time will double
- running time on the device will double
- build system is more complex
- the build directory goes from 300M to 1.2G due to the extra
reference outputs recorded under -ffp-contract=off,
- when running test-suite over small devices it will cost 1G more
transfer over the network.

Proposal 2: https://reviews.llvm.org/D25346
like Proposal 1, except that there are no files written to disk
(transferred over the network from the device to the host that does
the fpcmp and hashing), the outputs of both normal compilation and the
kernel compiled under "#pragma STDC FP_CONTRACT OFF" are computed and
compared on the device running the benchmark. The output of
-ffp-contract=off is written to disk, and as currently done in the
test-suite, the output is hashed and exactly matched against the
reference output.

The good for Proposal 2:
- no modifications to CMake and Makefiles
- no extra space to store the extra reference output
- tests both user CFLAGS specified mode and fast-math and fp-contraction=off.

The bad for Proposal 2:
- compilation time will double: e.g., Polly will optimize both kernels,
- memory requirements on the device will almost double: added one
extra output array, input arrays are not modified, so no need to
duplicate them,
- compute time on the device will more than double: running the kernel
twice, plus an extra loop over both outputs to compare with
FP_TOLERANCE.
- requires modifications to the code of the benchmarks: some
benchmarks may not be easily modified and will need to be only run
under -ffp-contract=off (as in Proposal 3.)

Proposal 3: https://reviews.llvm.org/D25351
modify the Makefiles and CMakes to explicitly specify the flags under
which the results will match the recorded reference output.

The good for Proposal 3:
- no modifications to the benchmarks
- minimal modifications to the build system

The bad for Proposal 3:
- these benchmarks will not be tested with -ffp-contract=on: exact
matching of the reference output requires -ffp-contract=off
- adding more tests (as in Proposals 1 and 2) is actually a good thing
for the test-suite

I would like to invite other people to review the above proposals and
suggest a way forward on fixing the current state of the test-suite
when running under CFLAGS="-Ofast" and "-ffp-contract=on." Once
consensus is achieved, I am willing to implement and follow up with
addressing all reviews necessary to commit the change to the
test-suite.

Thank you,
Sebastian

From: "Sebastian Pop" <sebpop.llvm@gmail.com>
To: "Renato Golin" <renato.golin@linaro.org>
Cc: "Kristof Beyls" <Kristof.Beyls@arm.com>, "Sebastian Paul Pop" <s.pop@samsung.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik" <a.skolnik@samsung.com>, "Clang Dev"
<cfe-dev@lists.llvm.org>, "Hal Finkel" <hfinkel@anl.gov>, "Stephen Canon" <scanon@apple.com>, "Matthias Braun"
<matze@braunis.de>
Sent: Friday, October 7, 2016 7:34:40 PM
Subject: [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

Hi,

I would like to provide a summary of the different proposals on how
to
fix the test-suite to make it succeed when specifying extra CFLAGS
"-Ofast" and "-ffp-contract=on". I would like to expose the issue
and
proposed ways to fix it to other potential reviewers that could
provide extra feedback. We also need to decide which proposal (or
combination of) to implement and commit.

Proposal 1: https://reviews.llvm.org/D25277
modify the CMakes to compile and run each of these benchmarks twice:
once with added CFLAGS -ffp-contract=off. Record on disk the full
output of both runs and compare with FP_TOLERANCE. Hash the output
of
the run with -ffp-contract=off and exact match against the reference
output.

The good for Proposal 1:
- changes contained in the build system: no change to the code of the
benchmarks
- runs benchmarks under an extra configuration with CFLAGS +=
-ffp-contract=off

The bad for Proposal 1:
- compilation time will double
- running time on the device will double
- build system is more complex
- the build directory goes from 300M to 1.2G due to the extra
reference outputs recorded under -ffp-contract=off,
- when running test-suite over small devices it will cost 1G more
transfer over the network.

I prefer proposal 1 (although, to be fair, it was something I suggested). Being the the business of trying to heavily modify every benchmark that does floating-point computation, as in proposal 2, does not seem to scale well, and can't always be done regardless.

We can make some effort to reduce the size of the problems being computed by some of the benchmarks (e.g. pollybench); I think that is reasonable and will help with the extra space requirements. That having been said, functionally speaking, our test suite is at least an order of magnitude too small, and so my sympathy is somewhat limited. We're going to have to find a way to execute the test suite in stages on smaller devices to limit the peak usage, if not because of this then because we've added a lot more test applications and benchmarks in the future.

-Hal

  • First: I don’t think we can find a 100% solution for the -ffp-contract=on differences; fpcmp with tolerances won’t work on the output of oggenc. Luckily this seems to be the only problematic benchmark today. But at least for that one I see no better solution than adding the -ffp-contract=off switch.

  • We should consider Polybench to be the problem here! Benchmarks that just run for a few seconds and produce hundreds of megabytes output are useless as a compiler/CPU benchmarks (time is really spend in libc, the kernel and waiting for disks). In case of well behaving benchmarks Proposal 1 is unnecessary: We can just ship the reference results together with the benchmark and use fpcmp with tolerances, we do that with most other benchmarks today. We just don’t really want to do that in the case of Polybench because the output is so huge, so instead we went for just shipping a md5sum of the output which now failed in combination with floating point accuracy swings, starting this whole discussion…

  • Because of the nature of Polybench I’d rather see Proposal 2 implemented. Compilation time of polybench is small compared to many of the other benchmarks, if we run into memory issues we can reduce the size of the arrays to normalize runtimes (so far I have no reason to believe we do though, looking at some random polybenchs it seemed by default they create a 1024*1024 array of doubles which should only be 8Meg per array). And hey if we modify the benchmarks anyway we could also add some checksumming to the code (maybe bitcasting the doubles to integers, adjusting for endianess and XOR’ing them together is enough?) and avoid all the I/O.

  • I personally could live with Proposal 3 on the grounds of just declaring polybench a problematic benchmark so -ffp-contract=off is fine as a stopgap measure and relying on the fact that we have several other benchmarks that have smaller references outputs and use fpcmp correctly. Of course Proposal 2 is the saner solution here.

  • Matthias

From: "Matthias Braun" <matze@braunis.de>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Sebastian Pop" <sebpop.llvm@gmail.com>, "Sebastian Paul Pop"
<s.pop@samsung.com>, "llvm-dev" <llvm-dev@lists.llvm.org>, "Clang
Dev" <cfe-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik"
<a.skolnik@samsung.com>
Sent: Friday, October 7, 2016 8:28:09 PM
Subject: Re: [llvm-dev] [test-suite] making the test-suite succeed
with "-Ofast" and "-ffp-contract=on"

> > From: "Sebastian Pop" < sebpop.llvm@gmail.com >
>

> > To: "Renato Golin" < renato.golin@linaro.org >
>

> > Cc: "Kristof Beyls" < Kristof.Beyls@arm.com >, "Sebastian Paul
> > Pop"
> > <
> > s.pop@samsung.com >, "llvm-dev"
>

> > < llvm-dev@lists.llvm.org >, "nd" < nd@arm.com >, "Abe Skolnik" <
> > a.skolnik@samsung.com >, "Clang Dev"
>

> > < cfe-dev@lists.llvm.org >, "Hal Finkel" < hfinkel@anl.gov >,
> > "Stephen Canon" < scanon@apple.com >, "Matthias Braun"
>

> > < matze@braunis.de >
>

> > Sent: Friday, October 7, 2016 7:34:40 PM
>

> > Subject: [test-suite] making the test-suite succeed with "-Ofast"
> > and
> > "-ffp-contract=on"
>

> > Hi,
>

> > I would like to provide a summary of the different proposals on
> > how
>

> > to
>

> > fix the test-suite to make it succeed when specifying extra
> > CFLAGS
>

> > "-Ofast" and "-ffp-contract=on". I would like to expose the issue
>

> > and
>

> > proposed ways to fix it to other potential reviewers that could
>

> > provide extra feedback. We also need to decide which proposal (or
>

> > combination of) to implement and commit.
>

> > Proposal 1: https://reviews.llvm.org/D25277
>

> > modify the CMakes to compile and run each of these benchmarks
> > twice:
>

> > once with added CFLAGS -ffp-contract=off. Record on disk the full
>

> > output of both runs and compare with FP_TOLERANCE. Hash the
> > output
>

> > of
>

> > the run with -ffp-contract=off and exact match against the
> > reference
>

> > output.
>

> > The good for Proposal 1:
>

> > - changes contained in the build system: no change to the code of
> > the
>

> > benchmarks
>

> > - runs benchmarks under an extra configuration with CFLAGS +=
>

> > -ffp-contract=off
>

> > The bad for Proposal 1:
>

> > - compilation time will double
>

> > - running time on the device will double
>

> > - build system is more complex
>

> > - the build directory goes from 300M to 1.2G due to the extra
>

> > reference outputs recorded under -ffp-contract=off,
>

> > - when running test-suite over small devices it will cost 1G more
>

> > transfer over the network.
>

> I prefer proposal 1 (although, to be fair, it was something I
> suggested). Being the the business of trying to heavily modify
> every
> benchmark that does floating-point computation, as in proposal 2,
> does not seem to scale well, and can't always be done regardless.

> We can make some effort to reduce the size of the problems being
> computed by some of the benchmarks (e.g. pollybench); I think that
> is reasonable and will help with the extra space requirements. That
> having been said, functionally speaking, our test suite is at least
> an order of magnitude too small, and so my sympathy is somewhat
> limited. We're going to have to find a way to execute the test
> suite
> in stages on smaller devices to limit the peak usage, if not
> because
> of this then because we've added a lot more test applications and
> benchmarks in the future.

> -Hal

> > Proposal 2: https://reviews.llvm.org/D25346
>

> > like Proposal 1, except that there are no files written to disk
>

> > (transferred over the network from the device to the host that
> > does
>

> > the fpcmp and hashing), the outputs of both normal compilation
> > and
>

> > the
>

> > kernel compiled under "#pragma STDC FP_CONTRACT OFF" are computed
> > and
>

> > compared on the device running the benchmark. The output of
>

> > -ffp-contract=off is written to disk, and as currently done in
> > the
>

> > test-suite, the output is hashed and exactly matched against the
>

> > reference output.
>

> > The good for Proposal 2:
>

> > - no modifications to CMake and Makefiles
>

> > - no extra space to store the extra reference output
>

> > - tests both user CFLAGS specified mode and fast-math and
>

> > fp-contraction=off.
>

> > The bad for Proposal 2:
>

> > - compilation time will double: e.g., Polly will optimize both
>

> > kernels,
>

> > - memory requirements on the device will almost double: added one
>

> > extra output array, input arrays are not modified, so no need to
>

> > duplicate them,
>

> > - compute time on the device will more than double: running the
>

> > kernel
>

> > twice, plus an extra loop over both outputs to compare with
>

> > FP_TOLERANCE.
>

> > - requires modifications to the code of the benchmarks: some
>

> > benchmarks may not be easily modified and will need to be only
> > run
>

> > under -ffp-contract=off (as in Proposal 3.)
>

> > Proposal 3: https://reviews.llvm.org/D25351
>

> > modify the Makefiles and CMakes to explicitly specify the flags
> > under
>

> > which the results will match the recorded reference output.
>

> > The good for Proposal 3:
>

> > - no modifications to the benchmarks
>

> > - minimal modifications to the build system
>

> > The bad for Proposal 3:
>

> > - these benchmarks will not be tested with -ffp-contract=on:
> > exact
>

> > matching of the reference output requires -ffp-contract=off
>

> > - adding more tests (as in Proposals 1 and 2) is actually a good
>

> > thing
>

> > for the test-suite
>

> > I would like to invite other people to review the above proposals
> > and
>

> > suggest a way forward on fixing the current state of the
> > test-suite
>

> > when running under CFLAGS="-Ofast" and "-ffp-contract=on." Once
>

> > consensus is achieved, I am willing to implement and follow up
> > with
>

> > addressing all reviews necessary to commit the change to the
>

> > test-suite.
>

> > Thank you,
>

> > Sebastian
>

- First: I don't think we can find a 100% solution for the
-ffp-contract=on differences; fpcmp with tolerances won't work on
the output of oggenc. Luckily this seems to be the only problematic
benchmark today. But at least for that one I see no better solution
than adding the -ffp-contract=off switch.

I agree. With application benchmarks like oggenc, I don't see any better solution.

- We should consider Polybench to be the problem here! Benchmarks
that just run for a few seconds and produce hundreds of megabytes
output are useless as a compiler/CPU benchmarks (time is really
spend in libc, the kernel and waiting for disks). In case of well
behaving benchmarks Proposal 1 is unnecessary: We can just ship the
reference results together with the benchmark and use fpcmp with
tolerances, we do that with most other benchmarks today. We just
don't really want to do that in the case of Polybench because the
output is so huge, so instead we went for just shipping a md5sum of
the output which now failed in combination with floating point
accuracy swings, starting this whole discussion...

- Because of the nature of Polybench I'd rather see Proposal 2
implemented. Compilation time of polybench is small compared to many
of the other benchmarks, if we run into memory issues we can reduce
the size of the arrays to normalize runtimes (so far I have no
reason to believe we do though, looking at some random polybenchs it
seemed by default they create a 1024*1024 array of doubles which
should only be 8Meg per array). And hey if we modify the benchmarks
anyway we could also add some checksumming to the code (maybe
bitcasting the doubles to integers, adjusting for endianess and
XOR'ing them together is enough?) and avoid all the I/O.

I agree that, regardless, the polybench benchmarks should be modified to do less I/O.

- I personally could live with Proposal 3 on the grounds of just
declaring polybench a problematic benchmark so -ffp-contract=off is
fine as a stopgap measure and relying on the fact that we have
several other benchmarks that have smaller references outputs and
use fpcmp correctly. Of course Proposal 2 is the saner solution
here.

I agree with Renato that there is a danger in the method suggested by the third proposal. In the very circumstances where the FP-contraction logic is known to be active, we'd be disabling it. We've had bugs in this logic in the past, both in the frontend and in the backend, that have at least caused crashes. We don't want the test suite to lose sensitivity to these (coverage is already not great). If FP-contraction being on is the default compiler mode, then we're more likely to run into problems in less-convenient ways than test-suite failures.

-Hal

From: "Hal Finkel via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Sebastian Pop" <sebpop.llvm@gmail.com>
Cc: "Sebastian Paul Pop" <s.pop@samsung.com>, "llvm-dev" <llvm-dev@lists.llvm.org>, "Matthias Braun"
<matze@braunis.de>, "Clang Dev" <cfe-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik" <a.skolnik@samsung.com>
Sent: Friday, October 7, 2016 7:56:53 PM
Subject: Re: [llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

> From: "Sebastian Pop" <sebpop.llvm@gmail.com>
> To: "Renato Golin" <renato.golin@linaro.org>
> Cc: "Kristof Beyls" <Kristof.Beyls@arm.com>, "Sebastian Paul Pop"
> <s.pop@samsung.com>, "llvm-dev"
> <llvm-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik"
> <a.skolnik@samsung.com>, "Clang Dev"
> <cfe-dev@lists.llvm.org>, "Hal Finkel" <hfinkel@anl.gov>, "Stephen
> Canon" <scanon@apple.com>, "Matthias Braun"
> <matze@braunis.de>
> Sent: Friday, October 7, 2016 7:34:40 PM
> Subject: [test-suite] making the test-suite succeed with "-Ofast"
> and "-ffp-contract=on"
>
> Hi,
>
> I would like to provide a summary of the different proposals on how
> to
> fix the test-suite to make it succeed when specifying extra CFLAGS
> "-Ofast" and "-ffp-contract=on". I would like to expose the issue
> and
> proposed ways to fix it to other potential reviewers that could
> provide extra feedback. We also need to decide which proposal (or
> combination of) to implement and commit.
>
> Proposal 1: https://reviews.llvm.org/D25277
> modify the CMakes to compile and run each of these benchmarks
> twice:
> once with added CFLAGS -ffp-contract=off. Record on disk the full
> output of both runs and compare with FP_TOLERANCE. Hash the output
> of
> the run with -ffp-contract=off and exact match against the
> reference
> output.
>
> The good for Proposal 1:
> - changes contained in the build system: no change to the code of
> the
> benchmarks
> - runs benchmarks under an extra configuration with CFLAGS +=
> -ffp-contract=off
>
> The bad for Proposal 1:
> - compilation time will double
> - running time on the device will double
> - build system is more complex
> - the build directory goes from 300M to 1.2G due to the extra
> reference outputs recorded under -ffp-contract=off,
> - when running test-suite over small devices it will cost 1G more
> transfer over the network.

I prefer proposal 1 (although, to be fair, it was something I
suggested). Being the the business of trying to heavily modify every
benchmark that does floating-point computation, as in proposal 2,
does not seem to scale well, and can't always be done regardless.

We can make some effort to reduce the size of the problems being
computed by some of the benchmarks (e.g. pollybench); I think that
is reasonable and will help with the extra space requirements. That
having been said, functionally speaking, our test suite is at least
an order of magnitude too small, and so my sympathy is somewhat
limited. We're going to have to find a way to execute the test suite
in stages on smaller devices to limit the peak usage, if not because
of this then because we've added a lot more test applications and
benchmarks in the future.

Another aspect to this is that we should have this kind of infrastructure for other purposes as well. We have a similar lack of testing for -ffast-math. We don't even do a good job of (i.e. have good buildbot coverage for) running the test suite @ -O1. -O2 and -O3 are much better tested. More regular testing at -O0 (especially with -g to pick up crashes in our debug-info generation logic) is needed as well.

-Hal

I think we should rather just setup more build jobs that run different confiruations rather than modifying the testsuite to compile multiple configurations in a single run, as that keeps things simpler.Chris and me also started collecting typical configurations in the test-suite/cmake/caches/ directory, I hope this will become a popular thing with more bot owners as it nicely documents what people are doing IMO and it makes it easy to describe the configuration used in bugreports / when reproducing issues on a personal machine.

  • Matthias

From: "Matthias Braun" <matze@braunis.de>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Sebastian Paul Pop" <s.pop@samsung.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Clang Dev" <cfe-dev@lists.llvm.org>,
"nd" <nd@arm.com>, "Abe Skolnik" <a.skolnik@samsung.com>
Sent: Friday, October 7, 2016 8:53:49 PM
Subject: Re: [llvm-dev] [test-suite] making the test-suite succeed
with "-Ofast" and "-ffp-contract=on"

> > From: "Hal Finkel via llvm-dev" < llvm-dev@lists.llvm.org >
>

> > To: "Sebastian Pop" < sebpop.llvm@gmail.com >
>

> > Cc: "Sebastian Paul Pop" < s.pop@samsung.com >, "llvm-dev" <
> > llvm-dev@lists.llvm.org >, "Matthias Braun"
>

> > < matze@braunis.de >, "Clang Dev" < cfe-dev@lists.llvm.org >,
> > "nd"
> > <
> > nd@arm.com >, "Abe Skolnik" < a.skolnik@samsung.com >
>

> > Sent: Friday, October 7, 2016 7:56:53 PM
>

> > Subject: Re: [llvm-dev] [test-suite] making the test-suite
> > succeed
> > with "-Ofast" and "-ffp-contract=on"
>

>

> > > From: "Sebastian Pop" < sebpop.llvm@gmail.com >
> >
>

> > > To: "Renato Golin" < renato.golin@linaro.org >
> >
>

> > > Cc: "Kristof Beyls" < Kristof.Beyls@arm.com >, "Sebastian Paul
> > > Pop"
> >
>

> > > < s.pop@samsung.com >, "llvm-dev"
> >
>

> > > < llvm-dev@lists.llvm.org >, "nd" < nd@arm.com >, "Abe Skolnik"
> >
>

> > > < a.skolnik@samsung.com >, "Clang Dev"
> >
>

> > > < cfe-dev@lists.llvm.org >, "Hal Finkel" < hfinkel@anl.gov >,
> > > "Stephen
> >
>

> > > Canon" < scanon@apple.com >, "Matthias Braun"
> >
>

> > > < matze@braunis.de >
> >
>

> > > Sent: Friday, October 7, 2016 7:34:40 PM
> >
>

> > > Subject: [test-suite] making the test-suite succeed with
> > > "-Ofast"
> >
>

> > > and "-ffp-contract=on"
> >
>

> > > Hi,
> >
>

> > > I would like to provide a summary of the different proposals on
> > > how
> >
>

> > > to
> >
>

> > > fix the test-suite to make it succeed when specifying extra
> > > CFLAGS
> >
>

> > > "-Ofast" and "-ffp-contract=on". I would like to expose the
> > > issue
> >
>

> > > and
> >
>

> > > proposed ways to fix it to other potential reviewers that could
> >
>

> > > provide extra feedback. We also need to decide which proposal
> > > (or
> >
>

> > > combination of) to implement and commit.
> >
>

> > > Proposal 1: https://reviews.llvm.org/D25277
> >
>

> > > modify the CMakes to compile and run each of these benchmarks
> >
>

> > > twice:
> >
>

> > > once with added CFLAGS -ffp-contract=off. Record on disk the
> > > full
> >
>

> > > output of both runs and compare with FP_TOLERANCE. Hash the
> > > output
> >
>

> > > of
> >
>

> > > the run with -ffp-contract=off and exact match against the
> >
>

> > > reference
> >
>

> > > output.
> >
>

> > > The good for Proposal 1:
> >
>

> > > - changes contained in the build system: no change to the code
> > > of
> >
>

> > > the
> >
>

> > > benchmarks
> >
>

> > > - runs benchmarks under an extra configuration with CFLAGS +=
> >
>

> > > -ffp-contract=off
> >
>

> > > The bad for Proposal 1:
> >
>

> > > - compilation time will double
> >
>

> > > - running time on the device will double
> >
>

> > > - build system is more complex
> >
>

> > > - the build directory goes from 300M to 1.2G due to the extra
> >
>

> > > reference outputs recorded under -ffp-contract=off,
> >
>

> > > - when running test-suite over small devices it will cost 1G
> > > more
> >
>

> > > transfer over the network.
> >
>

> > I prefer proposal 1 (although, to be fair, it was something I
>

> > suggested). Being the the business of trying to heavily modify
> > every
>

> > benchmark that does floating-point computation, as in proposal 2,
>

> > does not seem to scale well, and can't always be done regardless.
>

> > We can make some effort to reduce the size of the problems being
>

> > computed by some of the benchmarks (e.g. pollybench); I think
> > that
>

> > is reasonable and will help with the extra space requirements.
> > That
>

> > having been said, functionally speaking, our test suite is at
> > least
>

> > an order of magnitude too small, and so my sympathy is somewhat
>

> > limited. We're going to have to find a way to execute the test
> > suite
>

> > in stages on smaller devices to limit the peak usage, if not
> > because
>

> > of this then because we've added a lot more test applications and
>

> > benchmarks in the future.
>

> Another aspect to this is that we should have this kind of
> infrastructure for other purposes as well. We have a similar lack
> of
> testing for -ffast-math. We don't even do a good job of (i.e. have
> good buildbot coverage for) running the test suite @ -O1. -O2 and
> -O3 are much better tested. More regular testing at -O0 (especially
> with -g to pick up crashes in our debug-info generation logic) is
> needed as well.

I think we should rather just setup more build jobs that run
different confiruations rather than modifying the testsuite to
compile multiple configurations in a single run, as that keeps
things simpler.

Fair enough; however, for things like FP-contractions and fast-math optimizations, which can change the output, we need to figure how to to test these regularly, at least looking for compile-time failures (just disabling the tests for which these setting cause verification errors is unsatisfying because it is these very tests where we know the logic we want to test is actually firing).

-Hal

Proposal 4:

Investigate each problematic benchmark and apply the best solution for
each one of them, independently. For oggenc we may need something
different.

While investigating povray on a similar case (very small FP
differences over very few points of the output), I noticed we were
emitting NEON instructions as if they were IEEE compliant (they're
not). That lead me to fix a bad compiler bug.

Are we sure all the FP=on differences are *just* due to fusions? If
so, then lets look at the benchmarks and make them output less garbage
without resorting to hashes. I've done that to a number of tests and
benchmarks already. It's quite boring, yes, but it's necessary if we
want them to be meaningful.

The proposal 2 is actually good for Polybench, at least for the one
case where Sebastian has implemented. Yes, it doubles run time, but
it's validation run time, which is part of the test, and it doesn't
bloat disk/memory. Benchmark bots nowadays only run for a few
iterations anyway, and even the ARM bot (the slowest) is now only
taking 2hs per build.

My proposal is to go through all 50 cases and propose the lowest
number of solutions possible for all of them. I'm guessing this will
be between 2 and 4 different cases.

cheers,
--renato

Proposal 4:

Investigate each problematic benchmark and apply the best solution for
each one of them, independently. For oggenc we may need something
different.

[...]

My proposal is to go through all 50 cases and propose the lowest
number of solutions possible for all of them. I'm guessing this will
be between 2 and 4 different cases.

I like Proposal 4: we need different patches to different problems.
I am sure we do not have an understanding of all the problems in the 50
currently failing benchmarks, so we will need to analyze each problem.

The proposal 2 is actually good for Polybench, at least for the one
case where Sebastian has implemented. Yes, it doubles run time, but
it's validation run time, which is part of the test, and it doesn't
bloat disk/memory.

I see that handling Polybench separately as in Proposal 2 also falls
under Proposal 4, as handling that benchmark separately from the other
ones that may have different problems.

A separate follow-up patch can link a hashing algorithm in each
test of Polybench and output the hashed result to reduce I/O.

If everybody agrees on starting by fixing Polybench as described in
Proposal 2, I will complete the implementation of that patch, and follow-up
with the hashing of the output.

Thanks,
Sebastian

I'm not a big fan of hashing outputs, but if we make sure that the
comparison is done internally (like your polybench proposal) and we
can easily disable the hash (for debugging), then it should be ok.

cheers,
--renato

From: "Sebastian Pop via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Renato Golin" <renato.golin@linaro.org>
Cc: "Sebastian Paul Pop" <s.pop@samsung.com>, "llvm-dev" <llvm-dev@lists.llvm.org>, "Matthias Braun"
<matze@braunis.de>, "Clang Dev" <cfe-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik" <a.skolnik@samsung.com>
Sent: Saturday, October 8, 2016 8:25:49 AM
Subject: Re: [cfe-dev] [llvm-dev] [test-suite] making the test-suite succeed with "-Ofast" and "-ffp-contract=on"

> Proposal 4:
>
> Investigate each problematic benchmark and apply the best solution
> for
> each one of them, independently. For oggenc we may need something
> different.
>
[...]
> My proposal is to go through all 50 cases and propose the lowest
> number of solutions possible for all of them. I'm guessing this
> will
> be between 2 and 4 different cases.
>

I like Proposal 4: we need different patches to different problems.
I am sure we do not have an understanding of all the problems in the
50
currently failing benchmarks, so we will need to analyze each
problem.

> The proposal 2 is actually good for Polybench, at least for the one
> case where Sebastian has implemented. Yes, it doubles run time, but
> it's validation run time, which is part of the test, and it doesn't
> bloat disk/memory.

I see that handling Polybench separately as in Proposal 2 also falls
under Proposal 4, as handling that benchmark separately from the
other
ones that may have different problems.

A separate follow-up patch can link a hashing algorithm in each
test of Polybench and output the hashed result to reduce I/O.

If everybody agrees on starting by fixing Polybench as described in
Proposal 2, I will complete the implementation of that patch, and
follow-up
with the hashing of the output.

Yes, please fix polybench. I'm still not sure how much we should generalize this solution to other benchmarks, but polybench needs fixing somehow anyway, and this will be a big help.

-Hal