[RFC] Benchmarking subset of the test suite

At the LLVM Developers' Meeting in November, I promised to work on isolating a subset of the current test suite that is useful for benchmarking. Having looked at this in more detail, most of the applications and benchmarks in the test suite are useful for benchmarking, and so I think that a better way of phrasing it is that we should construct a list of programs in the test suite that are not useful for benchmarking.

My proposed exclusion list is provided below. I constructed this exclusion list primarily based on the following experiment: I ran the test suite 10 times in three configurations: 1) On an IBM POWER7 (P7) with -O3 -mvsx, 2) On a P7 at -O0 and 3) On an Intel Xeon E5430 with -O3 all using make -j6. I then used the ministat utility (which performs a T test) to compare the timings of the two P7 configurations against each other and the Xeon configuration, requiring a detectable difference at 99.5% confidence. I looked for tests that showed no significant difference in all three comparisons. The running configuration here is purposefully noisy, the idea is to eliminate those tests that are significantly sensitive to startup time, file I/O time, memory bandwidth, etc., or just too short, and by running many tests in parallel (non-deterministically), my hope is to eliminate those tests can cannot usefully serve as benchmarks in a "normal" environment.

I'll admit being somewhat surprised by so many of the Prolangs and Shootout "benchmarks" seemingly not serving as useful benchmarks; perhaps someone can look into improving the problem size, etc. of these.

Without further ado, I propose that a test-suite configuration designed for benchmarking exclude the following:

MultiSource/Applications/Burg/burg
MultiSource/Applications/treecc/treecc
MultiSource/Benchmarks/MiBench/office-ispell/office-ispell
MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish
MultiSource/Benchmarks/Prolangs-C/allroots/allroots
MultiSource/Benchmarks/Prolangs-C/archie-client/archie
MultiSource/Benchmarks/Prolangs-C/assembler/assembler
MultiSource/Benchmarks/Prolangs-C/compiler/compiler
MultiSource/Benchmarks/Prolangs-C++/deriv1/deriv1
MultiSource/Benchmarks/Prolangs-C++/deriv2/deriv2
MultiSource/Benchmarks/Prolangs-C++/family/family
MultiSource/Benchmarks/Prolangs-C/fixoutput/fixoutput
MultiSource/Benchmarks/Prolangs-C/football/football
MultiSource/Benchmarks/Prolangs-C++/fsm/fsm
MultiSource/Benchmarks/Prolangs-C++/garage/garage
MultiSource/Benchmarks/Prolangs-C++/NP/np
MultiSource/Benchmarks/Prolangs-C++/objects/objects
MultiSource/Benchmarks/Prolangs-C++/office/office
MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig
MultiSource/Benchmarks/Prolangs-C++/shapes/shapes
MultiSource/Benchmarks/Prolangs-C/simulator/simulator
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc
MultiSource/Benchmarks/Prolangs-C++/trees/trees
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail
MultiSource/Benchmarks/Prolangs-C/unix-tbl/unix-tbl
MultiSource/Benchmarks/Prolangs-C++/vcirc/vcirc
SingleSource/Benchmarks/McGill/exptree
SingleSource/Benchmarks/Shootout-C++/hello
SingleSource/Benchmarks/Shootout-C++/reversefile
SingleSource/Benchmarks/Shootout-C++/spellcheck
SingleSource/Benchmarks/Shootout-C++/sumcol
SingleSource/Benchmarks/Shootout-C++/wc
SingleSource/Benchmarks/Shootout-C++/wordfreq
SingleSource/Benchmarks/Shootout/hello
SingleSource/Regression/C++/2003-05-14-array-init
SingleSource/Regression/C++/2003-05-14-expr_stmt
SingleSource/Regression/C/2003-05-14-initialize-string
SingleSource/Regression/C/2003-05-21-BitfieldHandling
SingleSource/Regression/C/2003-05-21-UnionBitfields
SingleSource/Regression/C/2003-05-21-UnionTest
SingleSource/Regression/C/2003-05-22-LocalTypeTest
SingleSource/Regression/C/2003-05-22-VarSizeArray
SingleSource/Regression/C/2003-05-23-TransparentUnion
SingleSource/Regression/C++/2003-06-08-BaseType
SingleSource/Regression/C++/2003-06-08-VirtualFunctions
SingleSource/Regression/C++/2003-06-13-Crasher
SingleSource/Regression/C/2003-06-16-InvalidInitializer
SingleSource/Regression/C/2003-06-16-VolatileError
SingleSource/Regression/C++/2003-08-20-EnumSizeProblem
SingleSource/Regression/C++/2003-09-29-NonPODsByValue
SingleSource/Regression/C/2003-10-12-GlobalVarInitializers
SingleSource/Regression/C/2004-02-03-AggregateCopy
SingleSource/Regression/C/2004-03-15-IndirectGoto
SingleSource/Regression/C/2005-05-06-LongLongSignedShift
SingleSource/Regression/C/2008-01-07-LongDouble
SingleSource/Regression/C++/2008-01-29-ParamAliasesReturn
SingleSource/Regression/C++/2011-03-28-Bitfield
SingleSource/Regression/C/badidx
SingleSource/Regression/C/bigstack
SingleSource/Regression/C++/BuiltinTypeInfo
SingleSource/Regression/C/callargs
SingleSource/Regression/C/casts
SingleSource/Regression/C/compare
SingleSource/Regression/C/ConstructorDestructorAttributes
SingleSource/Regression/C/DuffsDevice
SingleSource/Regression/C++/EH/class_hierarchy
SingleSource/Regression/C++/EH/ConditionalExpr
SingleSource/Regression/C++/EH/ctor_dtor_count
SingleSource/Regression/C++/EH/ctor_dtor_count-2
SingleSource/Regression/C++/EH/dead_try_block
SingleSource/Regression/C++/EH/exception_spec_test
SingleSource/Regression/C++/EH/function_try_block
SingleSource/Regression/C++/EH/inlined_cleanup
SingleSource/Regression/C++/EH/recursive-throw
SingleSource/Regression/C++/EH/simple_rethrow
SingleSource/Regression/C++/EH/simple_throw
SingleSource/Regression/C++/EH/throw_rethrow_test
SingleSource/Regression/C++/fixups
SingleSource/Regression/C++/global_ctor
SingleSource/Regression/C/globalrefs
SingleSource/Regression/C++/global_type
SingleSource/Regression/C++/ofstream_ctor
SingleSource/Regression/C/pointer_arithmetic
SingleSource/Regression/C++/pointer_member
SingleSource/Regression/C++/pointer_method
SingleSource/Regression/C++/pointer_method2
SingleSource/Regression/C/PR10189
SingleSource/Regression/C/PR1386
SingleSource/Regression/C/PR491
SingleSource/Regression/C/PR640
SingleSource/Regression/C++/short_circuit_dtor
SingleSource/Regression/C/sumarray
SingleSource/Regression/C/sumarraymalloc
SingleSource/Regression/C/testtrace
SingleSource/UnitTests/2002-04-17-PrintfChar
SingleSource/UnitTests/2002-05-02-ArgumentTest
SingleSource/UnitTests/2002-05-02-CastTest
SingleSource/UnitTests/2002-05-02-CastTest1
SingleSource/UnitTests/2002-05-02-CastTest2
SingleSource/UnitTests/2002-05-02-CastTest3
SingleSource/UnitTests/2002-05-02-ManyArguments
SingleSource/UnitTests/2002-05-03-NotTest
SingleSource/UnitTests/2002-05-19-DivTest
SingleSource/UnitTests/2002-08-02-CastTest
SingleSource/UnitTests/2002-08-02-CastTest2
SingleSource/UnitTests/2002-08-19-CodegenBug
SingleSource/UnitTests/2002-10-09-ArrayResolution
SingleSource/UnitTests/2002-10-12-StructureArgs
SingleSource/UnitTests/2002-10-12-StructureArgsSimple
SingleSource/UnitTests/2002-10-13-BadLoad
SingleSource/UnitTests/2002-12-13-MishaTest
SingleSource/UnitTests/2003-04-22-Switch
SingleSource/UnitTests/2003-05-02-DependentPHI
SingleSource/UnitTests/2003-05-07-VarArgs
SingleSource/UnitTests/2003-05-12-MinIntProblem
SingleSource/UnitTests/2003-05-14-AtExit
SingleSource/UnitTests/2003-05-26-Shorts
SingleSource/UnitTests/2003-05-31-CastToBool
SingleSource/UnitTests/2003-05-31-LongShifts
SingleSource/UnitTests/2003-07-06-IntOverflow
SingleSource/UnitTests/2003-07-08-BitOpsTest
SingleSource/UnitTests/2003-07-09-LoadShorts
SingleSource/UnitTests/2003-07-09-SignedArgs
SingleSource/UnitTests/2003-07-10-SignConversions
SingleSource/UnitTests/2003-08-05-CastFPToUint
SingleSource/UnitTests/2003-08-11-VaListArg
SingleSource/UnitTests/2003-08-20-FoldBug
SingleSource/UnitTests/2003-09-18-BitFieldTest
SingleSource/UnitTests/2003-10-13-SwitchTest
SingleSource/UnitTests/2003-10-29-ScalarReplBug
SingleSource/UnitTests/2004-02-02-NegativeZero
SingleSource/UnitTests/2004-06-20-StaticBitfieldInit
SingleSource/UnitTests/2004-11-28-GlobalBoolLayout
SingleSource/UnitTests/2005-05-11-Popcount-ffs-fls
SingleSource/UnitTests/2005-05-12-Int64ToFP
SingleSource/UnitTests/2005-05-13-SDivTwo
SingleSource/UnitTests/2005-07-17-INT-To-FP
SingleSource/UnitTests/2005-11-29-LongSwitch
SingleSource/UnitTests/2006-01-29-SimpleIndirectCall
SingleSource/UnitTests/2006-02-04-DivRem
SingleSource/UnitTests/2006-12-01-float_varg
SingleSource/UnitTests/2006-12-04-DynAllocAndRestore
SingleSource/UnitTests/2006-12-07-Compare64BitConstant
SingleSource/UnitTests/2006-12-11-LoadConstants
SingleSource/UnitTests/2007-01-04-KNR-Args
SingleSource/UnitTests/2007-03-02-VaCopy
SingleSource/UnitTests/2007-04-25-weak
SingleSource/UnitTests/2008-04-18-LoopBug
SingleSource/UnitTests/2008-04-20-LoopBug2
SingleSource/UnitTests/2008-07-13-InlineSetjmp
SingleSource/UnitTests/2009-04-16-BitfieldInitialization
SingleSource/UnitTests/2009-12-07-StructReturn
SingleSource/UnitTests/2010-05-24-BitfieldTest
SingleSource/UnitTests/AtomicOps
SingleSource/UnitTests/block-byref-cxxobj-test
SingleSource/UnitTests/block-byref-test
SingleSource/UnitTests/block-call-r7674133
SingleSource/UnitTests/block-copied-in-cxxobj
SingleSource/UnitTests/block-copied-in-cxxobj-1
SingleSource/UnitTests/blockstret
SingleSource/UnitTests/byval-alignment
SingleSource/UnitTests/conditional-gnu-ext
SingleSource/UnitTests/conditional-gnu-ext-cxx
SingleSource/UnitTests/DefaultInitDynArrays
SingleSource/UnitTests/FloatPrecision
SingleSource/UnitTests/initp1
SingleSource/UnitTests/member-function-pointers
SingleSource/UnitTests/printargs
SingleSource/UnitTests/SignlessTypes/cast2
SingleSource/UnitTests/SignlessTypes/cast-bug
SingleSource/UnitTests/SignlessTypes/ccc
SingleSource/UnitTests/SignlessTypes/div
SingleSource/UnitTests/SignlessTypes/factor
SingleSource/UnitTests/SignlessTypes/Large/cast
SingleSource/UnitTests/SignlessTypes/shr
SingleSource/UnitTests/stmtexpr
SingleSource/UnitTests/StructModifyTest
SingleSource/UnitTests/TestLoop
SingleSource/UnitTests/Threads/2010-12-08-tls
SingleSource/UnitTests/Threads/tls
SingleSource/UnitTests/Vector/build
SingleSource/UnitTests/Vector/divides
SingleSource/UnitTests/Vector/sumarray
SingleSource/UnitTests/Vector/sumarray-dbl
SingleSource/UnitTests/vla

SingleSource/UnitTests/Vector/Altivec/2007-01-07-lvsl-lvsr-Regression
SingleSource/UnitTests/Vector/Altivec/alti.sdot
SingleSource/UnitTests/Vector/Altivec/casts
SingleSource/UnitTests/Vector/Altivec/test1

SingleSource/UnitTests/ms_struct-bitfield
SingleSource/UnitTests/ms_struct-bitfield-1
SingleSource/UnitTests/ms_struct-bitfield-init
SingleSource/UnitTests/ms_struct-bitfield-init-1
SingleSource/UnitTests/ms_struct_pack_layout
SingleSource/UnitTests/ms_struct_pack_layout-1

If anyone has any proposed modifications to this list, please let me know. Otherwise, I'll construct a patch to the test-suite makefiles and mail it for review soon.

Thanks in advance,
Hal

Hi everyone!

I'm trying to implement lisp's funcall function, which roughly calls a function, name of which
is known only at runtime.
I know that LLVM IR 'call' directive can accept function pointers, so the question is,
is there a simple way to get a function pointer from a function name (represented as e.g. i8*)?

BTW, I'm using llvmpy to generate LLVM IR, so if there is a way to do this using llvmpy tools, it would also do.

Yours sincerely,
Alexandr Popolitov

Don’t you have the same problem with other atoms, e.g. variable names?
This sounds like something that should be implemented in the language’s runtime library.

Well, the only place I need to get a variable pointer (pointer to a symbol) from its name, is in intern function.
And symbols are implemented as Python classes, so to LLVM IR they appear as PyObjects, or, more precisely, as i8*'s.
But all the elementary functions, that operate on symbols (e.g. cons) are written with the in mind.

Then, for functions I could have done just that: write a wrapper around get_function_by_name method of Module object.
However, this will give me PyObject*, not a pointer to an actual function, which I suppose should be fed to call instruction.

Then, maybe, question can be rephrased as: “how to obtain function pointer from Function llvmpy’s object”?

Without further ado, I propose that a test-suite configuration designed for benchmarking exclude the following:

Unrelatedly to your point, I think we should just outright remove the bulk of the SingleSource/Regression / SingleSource/UnitTests tests. They largely date back to before Clang existed and aren’t serving much useful value anymore.

Any of them that are useful (e.g. the EH tests?) could be saved or moved somewhere else.

-Chris

oOn 04/05/2014 14:39, Hal Finkel wrote:

At the LLVM Developers' Meeting in November, I promised to work on isolating a subset of the current test suite that is useful for benchmarking. Having looked at this in more detail, most of the applications and benchmarks in the test suite are useful for benchmarking, and so I think that a better way of phrasing it is that we should construct a list of programs in the test suite that are not useful for benchmarking.

My proposed exclusion list is provided below. I constructed this exclusion list primarily based on the following experiment: I ran the test suite 10 times in three configurations: 1) On an IBM POWER7 (P7) with -O3 -mvsx, 2) On a P7 at -O0 and 3) On an Intel Xeon E5430 with -O3 all using make -j6. I then used the ministat utility (which performs a T test) to compare the timings of the two P7 configurations against each other and the Xeon configuration, requiring a detectable difference at 99.5% confidence. I looked for tests that showed no significant difference in all three comparisons. The running configuration here is purposefully noisy, the idea is to eliminate those tests that are significantly sensitive to startup time, file I/O time, memory bandwidth, etc., or just too short, and by running many tests in parallel (non-deterministically), my hope is to eliminate those tests can cannot usefully serve as benchmarks in a "normal" environment.

I'll admit being somewhat surprised by so many of the Prolangs and Shootout "benchmarks" seemingly not serving as useful benchmarks; perhaps someone can look into improving the problem size, etc. of these.

Without further ado, I propose that a test-suite configuration designed for benchmarking exclude the following:

Hi Hal,

thanks for putting the effort! I think the systematic approach you have taken is very sensible.

I went through your list and looked at a couple of interesting cases. For the shootout benchmarks I looked at the results and the history my LNT -O3 builder shows (long history, always 10 samples per run, http://llvm.org/perf/db_default/v4/nts/25326)

Some observations from my side:

## Many benchmarks from your list have a runtime of zero seconds reported in my tester

## For some of the benchmarks you propose, manually looking at the
   historic samples allows a human to spot certain trends:

> MultiSource/Benchmarks/Prolangs-C/football/football

http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.237=34.237.3&submit=Update

> MultiSource/Benchmarks/Prolangs-C/simulator/simulator

http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.314=34.314.3&submit=Update

## Some other benchmarks with zero seconds execution time are not contained in your list. E.g.:

SingleSource/Benchmarks/Shootout/objinst
SingleSource/Benchmarks/Shootout-C++/objinst

## Some benchmarks on your list are really _no_ benchmarks:

Shoothout hello:

#include <stdio.h>

int main() {
     puts("hello world\n");
     return(0);
}

Shootout sumcol:

int main(int argc, char * * argv) {
     char line[MAXLINELEN];
     int sum = 0;
     char buff[4096];
     cin.rdbuf()->pubsetbuf(buff, 4096); // enable buffering

     while (cin.getline(line, MAXLINELEN)) {
         sum += atoi(line);
     }
     cout << sum << '\n';
}

To subsum, I believe this list might benefit from some improvements, but it seems to be a really good start. If someone wants to do a more extensive analysis, we can always analyze the historic data available in my -O3 performance buildbot. It should give us a very good idea on how noisy certain benchmarks are.

Cheers,
Tobias

Ok, I’ve managed to get pointer to a function via ptr attribute of Function object.

now
(defun foo (x) x)
(defun bar (x) (funcall 'foo x))

generates the following LLVM IR:

define i8* @foo(i8* %x) {
entry:
ret i8* %x
}

define i8* @bar(i8* %x) {
entry:
%0 = alloca [4 x i8], align 1
store [4 x i8] c"foo\00", [4 x i8]* %0, align 1
%calltmp = call i8* @intern([4 x i8]* %0)
%ptrfind = call i8* @find_llvm_function(i8* %calltmp)
%cast = bitcast i8* %ptrfind to i8* (i8*)*
%calltmp1 = call i8* %cast(i8* %x)
ret i8* %calltmp1
}

except last two lines (with %calltmp1 and ret) do not actually get generated, because generation of call instruction fails with
error ‘argument is neither a function nor a function pointer’

I thought, that bitcast instruction before should do the trick of transforming return value of @find_llvm_function
(which is a wrapper around Python code, which searches in the Module object) to a function pointer.
Also, there is some possibility, that I got the guts of @find_llvm_function wrong, however, in that case I’d expect,
that code would still be generated, and problems would manifestate only at runtime.

Any ideas what am I doing wrong?

Hi,

define i8* @bar(i8* %x) {
entry:
  %0 = alloca [4 x i8], align 1
  store [4 x i8] c"foo\00", [4 x i8]* %0, align 1
  %calltmp = call i8* @intern([4 x i8]* %0)
  %ptrfind = call i8* @find_llvm_function(i8* %calltmp)
  %cast = bitcast i8* %ptrfind to i8* (i8*)*
  %calltmp1 = call i8* %cast(i8* %x)
  ret i8* %calltmp1
}

except last two lines (with %calltmp1 and ret) do not actually get
generated, because generation of call instruction fails with
error 'argument is neither a function nor a function pointer'

Since this function works when put into a text file and run through
llc manually (& opt & ....), it looks like the problem is that you're
not creating the IR you think you are.

What gets output when you call "dump" on the callee parameter to
CreateCall (or whatever the python version you're using is)?

Also, there is some possibility, that I got the guts of @find_llvm_function
wrong, however, in that case I'd expect, that code would still be generated,
and problems would manifestate only at runtime.

Agreed.

Cheers.

Tim.

The EH part is somewhat useful for early warnings on silly stuff.
Especially now that we're introducing EHABI support in the compiler
and related libraries, I'd like to keep them for a while. EH tests
rely on many parts and moving them to the LIT tests is not possible
(or not suitable), but we could have a better batch of tests.

If anyone know of any better (the ones we have are *really* silly),
feel free to replace it.

Thanks,
--renato

From: "Chris Lattner" <clattner@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, May 4, 2014 1:37:42 PM
Subject: Re: [LLVMdev] [RFC] Benchmarking subset of the test suite

> Without further ado, I propose that a test-suite configuration
> designed for benchmarking exclude the following:

Unrelatedly to your point, I think we should just outright remove the
bulk of the SingleSource/Regression / SingleSource/UnitTests tests.
They largely date back to before Clang existed and aren’t serving
much useful value anymore.

Any of them that are useful (e.g. the EH tests?) could be saved or
moved somewhere else.

I have mixed feelings about this. On the one hand, the regression and unit tests don't belong with the others, and if we start running applications multiple times for benchmarking, we'd likely want to exclude these correctness-only tests. On the other hand, I've found some of these tests quite useful in catching miscompiles (the EH tests, the bitfield tests, vector intrinsics tests and the va_args tests specifically come to mind). It might be useful to separate out some set of 'ABI' tests, even if we remove the others.

-Hal

I don't know, whether this is equivalent to what debug in C++ does,
but in this snippet:

ptrvoid = G_LLVM_BUILDER.call(ptrfinder, [funcallee], 'ptrfind')
print >> stderr, ptrvoid.__str__()
ptr = G_LLVM_BUILDER.bitcast(ptrvoid, funct_type, name='cast')
print >> stderr, ptr.__str__()
res = G_LLVM_BUILDER.call
print >> stderr, res.__str__()

precisely the following gets printed to stderr

%ptrfind = call i8* @find_llvm_function(i8* %calltmp)
%cast = bitcast i8* %ptrfind to i8* (i8*)*

and then there's an error, so third line never gets printed

From: "Tobias Grosser" <tobias@grosser.es>
To: "Hal Finkel" <hfinkel@anl.gov>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Sunday, May 4, 2014 1:40:52 PM
Subject: Re: [LLVMdev] [RFC] Benchmarking subset of the test suite

oOn 04/05/2014 14:39, Hal Finkel wrote:
> At the LLVM Developers' Meeting in November, I promised to work on
> isolating a subset of the current test suite that is useful for
> benchmarking. Having looked at this in more detail, most of the
> applications and benchmarks in the test suite are useful for
> benchmarking, and so I think that a better way of phrasing it is
> that we should construct a list of programs in the test suite that
> are not useful for benchmarking.
>
> My proposed exclusion list is provided below. I constructed this
> exclusion list primarily based on the following experiment: I ran
> the test suite 10 times in three configurations: 1) On an IBM
> POWER7 (P7) with -O3 -mvsx, 2) On a P7 at -O0 and 3) On an Intel
> Xeon E5430 with -O3 all using make -j6. I then used the ministat
> utility (which performs a T test) to compare the timings of the
> two P7 configurations against each other and the Xeon
> configuration, requiring a detectable difference at 99.5%
> confidence. I looked for tests that showed no significant
> difference in all three comparisons. The running configuration
> here is purposefully noisy, the idea is to eliminate those tests
> that are significantly sensitive to startup time, file I/O time,
> memory bandwidth, etc., or just too short, and by running many
> tests in parallel (non-deterministically), my hope is to eliminate
> those tests can cannot usefully serve as benchmarks in a "normal"
> environment.
>
> I'll admit being somewhat surprised by so many of the Prolangs and
> Shootout "benchmarks" seemingly not serving as useful benchmarks;
> perhaps someone can look into improving the problem size, etc. of
> these.
>
> Without further ado, I propose that a test-suite configuration
> designed for benchmarking exclude the following:

Hi Hal,

thanks for putting the effort! I think the systematic approach you
have
taken is very sensible.

I went through your list and looked at a couple of interesting cases.

Thanks! -- I figured you'd have something to add to this endeavor :wink:

For the shootout benchmarks I looked at the results and the history
my
LNT -O3 builder shows (long history, always 10 samples per run,
http://llvm.org/perf/db_default/v4/nts/25326)

Some observations from my side:

## Many benchmarks from your list have a runtime of zero seconds
reported in my tester

This is true from my data is well.

## For some of the benchmarks you propose, manually looking at the
   historic samples allows a human to spot certain trends:

> MultiSource/Benchmarks/Prolangs-C/football/football

http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.237=34.237.3&submit=Update

> MultiSource/Benchmarks/Prolangs-C/simulator/simulator

http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.314=34.314.3&submit=Update

Are these plots of compile time or execution time? Both of these say, "Type: compile_time". I did not consider compile time in my analysis, and I think that is a separate issue.

## Some other benchmarks with zero seconds execution time are not
contained in your list. E.g.:

SingleSource/Benchmarks/Shootout/objinst
SingleSource/Benchmarks/Shootout-C++/objinst

Interestingly, on my x86 machines this also executes for zero time, but at -O0 it takes a significant amount of time (and on PPC, even at -O3, it runs for about 0.0008s). So I think it is still useful to keep these.

## Some benchmarks on your list are really _no_ benchmarks:

Shoothout hello:

#include <stdio.h>

int main() {
     puts("hello world\n");
     return(0);
}

Shootout sumcol:

int main(int argc, char * * argv) {
     char line[MAXLINELEN];
     int sum = 0;
     char buff[4096];
     cin.rdbuf()->pubsetbuf(buff, 4096); // enable buffering

     while (cin.getline(line, MAXLINELEN)) {
         sum += atoi(line);
     }
     cout << sum << '\n';
}

Indeed.

To subsum, I believe this list might benefit from some improvements,
but
it seems to be a really good start. If someone wants to do a more
extensive analysis, we can always analyze the historic data available
in
my -O3 performance buildbot. It should give us a very good idea on
how
noisy certain benchmarks are.

Sounds good to me.

-Hal

Good catch. I get it wrong. They also have zero seconds execution time, so they can probably be easily removed as well.

Tobias

Without further ado, I propose that a test-suite configuration designed for benchmarking exclude the following:

Unrelatedly to your point, I think we should just outright remove the bulk of the SingleSource/Regression / SingleSource/UnitTests tests.

I agree that we should not use these programs for benchmarking the compiler but I actually like having these tests around for correctness of the compiler. When developing new optimizations it is really convenient to debug a single file testcase.

Ok, after all, it seems to be totally llvmpy issue, since python types of %ptrfind and %cast
are StructType. Error message is also generated by llvmpy's python code, not any C library it uses.

+1 for small, single-source ABI execution tests. There's definitely a
place for them.

Without further ado, I propose that a test-suite configuration designed for benchmarking exclude the following:

Unrelatedly to your point, I think we should just outright remove the bulk of the SingleSource/Regression / SingleSource/UnitTests tests. They largely date back to before Clang existed and aren’t serving much useful value anymore.

Any of them that are useful (e.g. the EH tests?) could be saved or moved somewhere else.

-Chris

I think that all of test-suite is useful.

Occasionally bugs show up in unexpected places and for unexpected reasons and are caught by some mild mannered member of test-suite.

Sorry that I’m late to this. I scanned the list you provided and it makes sense to me. But this one caught me by surprise. I remember we put quite a bit of effort tuning the scheduler to improve codegen quality of blowfish. Are you sure this particular benchmark isn’t serving its purpose?

Evan

I think that blowfish was in John the Ripper yes? This is a different
blowfish benchmark. That said, I guess it would depend on how this one
is written.

-eric

From: "Evan Cheng" <evan.cheng@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, May 19, 2014 3:35:45 PM
Subject: Re: [LLVMdev] [RFC] Benchmarking subset of the test suite

security-blowfish

Sorry that I'm late to this. I scanned the list you provided and it
makes sense to me. But this one caught me by surprise. I remember we
put quite a bit of effort tuning the scheduler to improve codegen
quality of blowfish. Are you sure this particular benchmark isn't
serving its purpose?

Yes. As currently configured, the running time is much too short. On PPC64, the running time is <= 0.001s, and -O3 and -O0 cannot be statistically distinguished. On x86-64, the running time is even smaller (timeit reports 0.0000 for -O3 builds).

-Hal