LNT BenchmarkGame

Hi folks,

I’m investigating the LNT failures on our bot and found that I cannot reproduce BenchmarkGame pass.

I’ve compiled it with GCC, Clang on both ARM and x86_64, with -O3 or with the arguments that the test-suite passes to it and all I can get is the result below:

Found duplicate: 420094
Found duplicate: 341335
Found duplicate: 150397
Found duplicate: 157527
Found duplicate: 269724

But not the one that is on the reference output:

Found duplicate: 4
Found duplicate: 485365
Found duplicate: 417267
Found duplicate: 436989
Found duplicate: 60067

If I run the LNT on my machine (x86_64) that test fails, and if I change the reference output to the one above, it passes.

On the ARM buildbot I’m also getting the same results, so I’m really surprised that the x86_64 LNT buildbot is passing. PowerPC is also failing, and I suspect for the same reason.

Is there any chance that the results are not being checked correctly? Any other ideas? I’m tempted to just change the reference output and see what happens with the other bots…

thanks,
–renato

Is there any chance that the results are not being checked correctly? Any
other ideas?

I think I vaguely convinced myself that the infrastructure didn't
actually check whether tests it classified as benchmarks passed or
failed. Not sure I had any good evidence for it other than things like
you're seeing.

I'm tempted to just change the reference output and see what
happens with the other bots...

Could be worth a try. But if that thing really is generating random
numbers I'm not sure replacing one genuine cast-iron random number
with another is the best solution long-term.

Tim.

The test is initializing srand(1), so in theory, it shouldn't be different
between compilers, since Clang is using the same libraries.

Also, if the "native" result is generated by GCC, than all problems go
away, since the result will be target dependent (or rather, library
dependent). Is there a way to turn on the dynamic generation of the native
file instead of copying it from the reference_output?

cheers,
--renato

Hi Renato,

The test is initializing srand(1), so in theory, it shouldn't be different
between compilers, since Clang is using the same libraries.

If Clang and GCC disagree on the same source, same machine and with
the same libraries, that certainly is odd. But it doesn't make
checking against the output of a particular libc's RNG any better an
idea in general.

Cheers.

Tim.

They don't. That's the odd bit. GCC and Clang agree on the output on both
ARM and x86_64, and neither agree with the reference_output file.

What could be happening is that the version of the libraries on that
buildbot is old, and both ARM and x86_64 have been updated.

I'm not suggesting we should keep replacing the "golden" file for the new
value, but that we should disable checking the reference_output at all, and
rely on a GCC vs. Clang comparison.

I agree that the comparison is no better than a reference file (since it,
too, could be wrong), but comparing both outputs eliminate any library
mismatch and it's less likely that both GCC and Clang will be wrong about
exactly the same thing at the same time.

Is there a way to turn off the check against the reference_output and make
it check against a GCC executable output?

cheers,
--renato

I agree; I'm pretty sure that the only guarantee is that for a given implementation of stand, if you initialize it with the same seed, you get the same sequence.

There is no "correct" sequence.

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists@gmail.com>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki

I'm not suggesting a correct sequence, I'm just looking for a way to turn
off the verification against the reference output and force LNT to run GCC
for the "native" output.

cheers,
--renato

From: "Renato Golin" <renato.golin@linaro.org>
To: "Marshall Clow" <mclow.lists@gmail.com>
Cc: "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, March 12, 2013 10:22:41 AM
Subject: Re: [LLVMdev] LNT BenchmarkGame

I agree; I'm pretty sure that the only guarantee is that for a given
implementation of stand, if you initialize it with the same seed,
you get the same sequence.

There is no "correct" sequence.

I'm not suggesting a correct sequence, I'm just looking for a way to
turn off the verification against the reference output and force LNT
to run GCC for the "native" output.

Can't we just paste in a RNG so that we'll get the same output on all systems (and can still use the reference output)?

-Hal

We can, though other tests suffer from the same issue. Would be good to
have a solution to all of them without pasting the same code on all of them.

I really thought that the native output was always generated by the "native
compiler" which is normally GCC. Removing the reference output doesn't
work, since it just creates an empty file instead. The Makefile is too
simple to mean anything, but maybe there's some environment variable that
needs setting to make LNT get the result from a GCC run...

--renato

Hi Renato,

That was my initial assumption, too. But if I just run that test, the
Makefile doesn't use GCC at all and only copies the reference_output to the
out-nat file.

I then copied a "good" output to the reference_output, and the test passed.
I'm intrigued... :wink:

Attached is a test.log of a local run. The buildbots' logs are pretty
similar on BenchmarkGame.

cheers,
--renato

test.log (41.7 KB)

Hi Renato,

This is probably a platform specific dependency where the Linux output file differs from the Darwin one. I fixed up a lot of those in the past but the random number issue blocks some others. For reference see LLVM r111522.

On my machine I get output that matches the reference output:

> Is there any chance that the results are not being checked correctly? Any
> other ideas?

I think I vaguely convinced myself that the infrastructure didn't
actually check whether tests it classified as benchmarks passed or
failed. Not sure I had any good evidence for it other than things like
you're seeing.

This is false.

Every test gets compared against some kind of expected output file (which
includes the exit code). The correct output is either:
a. a reference output file
or
b. the output from a natively run executable
depending on some of the test parameters.

- Daniel

Hi Renato,

    IIRC the reference output is not used by default. You have to put
       USE_REFERENCE_OUTPUT := 1
    in the Makefile in order to make use of the reference output. As
    BenchmarkGame doesn't have this, are you sure the reference output
    is causing the problem?

That was my initial assumption, too. But if I just run that test, the Makefile
doesn't use GCC at all and only copies the reference_output to the out-nat file.

if you look at the first line of your log

2013-03-12 15:19:41: running: "make" "-k" "TARGET_LLVMGCC=/home/rengolin/devel/llvm/build/bin/clang" "TARGET_CXX=None" "LLI_OPTFLAGS=-O3" "TARGET_CC=None" "TARGET_LLVMGXX=/home/rengolin/devel/llvm/build/bin/clang++" "TEST=simple" "CC_UNDER_TEST_IS_CLANG=1" "ENABLE_PARALLEL_REPORT=1" "TARGET_FLAGS=" "USE_REFERENCE_OUTPUT=1" "CC_UNDER_TEST_TARGET_IS_X86_64=1" "OPTFLAGS=-O3" "LLC_OPTFLAGS=-O3" "ENABLE_OPTIMIZED=1" "ARCH=x86_64" "ENABLE_HASHED_PROGRAM_OUTPUT=1" "DISABLE_JIT=1" "-C" "SingleSource/Benchmarks/BenchmarkGame" "-j" "8" "report" "report.simple.csv"

then you see that it forces USE_REFERENCE_OUTPUT=1. Maybe LNT does that?

Ciao, Duncan.

Can't we just paste in a RNG so that we'll get the same output on all
systems (and can still use the reference output)?

We can, though other tests suffer from the same issue. Would be good to
have a solution to all of them without pasting the same code on all of them.

I really thought that the native output was always generated by the
"native compiler" which is normally GCC. Removing the reference output
doesn't work, since it just creates an empty file instead. The Makefile is
too simple to mean anything, but maybe there's some environment variable
that needs setting to make LNT get the result from a GCC run...

The test suite supports multiple modes, one mode in which the native output
is generated by an executable built by the native compiler, another in
which the output is compared to a reference compiler.

The former mode is historically what the test suite did, the latter mode is
substantially faster (and independent of bugs in the native CC).

- Daniel

Ha! Well spotted, thanks! :wink:

I think we should force it zero on random tests...

--renato

Yes, I agree this is better for many cases, but not for all. Implementing
RNG that is good enough for the tests' purposes, fast enough not to steal
the benchmarks' hot spots and does not use target/library-specific code is
not trivial. I think that, in this particular case, having bugs in GCC is
far less problematic than assuming fixed outputs.

I've tried USE_REFERENCE_OUTPUT := 0 on the Makefile, but the test.log
still prints it as 1 (and fails).

cheers,
--renato

The former mode is historically what the test suite did, the latter mode
is substantially faster (and independent of bugs in the native CC).

Yes, I agree this is better for many cases, but not for all. Implementing
RNG that is good enough for the tests' purposes, fast enough not to steal
the benchmarks' hot spots and does not use target/library-specific code is
not trivial.

This is not true, all one needs to do is replace existing srand(), rand()
with some specific platforms version (and those are usually very simple
RNGs). If the code is already using srand()/rand() then there is no reason
to assume somehow the benchmark is worse if it always used the FreeBSD one,
say, as opposed to a platform specific one.

- Daniel

I think that, in this particular case, having bugs in GCC is far less

I'm not convinced that running GCC on library-specific tests will be worse
than pasting library code inside each test that has a library problem.

In theory, it should just work if we manage to disable USE_REFERENCE_OUTPUT
for those particular tests.

cheers,
--renato

From: "Daniel Dunbar" <daniel@zuster.org>
To: "Renato Golin" <renato.golin@linaro.org>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "Marshall Clow" <mclow.lists@gmail.com>, "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, March 12, 2013 12:30:12 PM
Subject: Re: [LLVMdev] LNT BenchmarkGame

The former mode is historically what the test suite did, the latter
mode is substantially faster (and independent of bugs in the native
CC).

Yes, I agree this is better for many cases, but not for all.
Implementing RNG that is good enough for the tests' purposes, fast
enough not to steal the benchmarks' hot spots and does not use
target/library-specific code is not trivial.

This is not true, all one needs to do is replace existing srand(),
rand() with some specific platforms version (and those are usually
very simple RNGs). If the code is already using srand()/rand() then
there is no reason to assume somehow the benchmark is worse if it
always used the FreeBSD one, say, as opposed to a platform specific
one.

+1

There are a couple of example implementations here which are only a few lines long:
http://wiki.osdev.org/Random_Number_Generator

-Hal