LLVM benchmarks against GCC

Hi all,

i was thinking that this question was not good
right after relese 1.0, but now perhaps it is OK...
if not, then I am sorry.

So, what about current status of benchmarks?

I mean comparison to gcc.

I have looked at
http://llvm.cs.uiuc.edu/testresults/X86/

Unfortunatelly graphs lines are hardly for human eye,
but tables are OK.

I give my own interpretation for April 26, 2004 benchmrking
tests, please don't beat me if i am wrong, but correct me.
Focus of my attention is the time of execution of the program,
i.e., fields GCC/CBE and GCC/LLC.

For me it looks like following.

i was thinking that this question was not good
right after relese 1.0, but now perhaps it is OK...
if not, then I am sorry.

You could always ask, it's just that the answer changes over time. :slight_smile:

So, what about current status of benchmarks?
I mean comparison to gcc.

It's slowly getting better. :slight_smile:

I have looked at
http://llvm.cs.uiuc.edu/testresults/X86/

Unfortunatelly graphs lines are hardly for human eye,
but tables are OK.

Note that there is often a bit of noise in those numbers. In particular,
the programs are only run once and "real" time is reported. The nightly
tester runs in the middle of the night so the machine is still unloaded,
but noise is an issue.

I give my own interpretation for April 26, 2004 benchmrking tests,
please don't beat me if i am wrong, but correct me. Focus of my
attention is the time of execution of the program, i.e., fields GCC/CBE
and GCC/LLC.

Yup, that's a good idea. You might also be interested in LLC-LS, which is
the X86 backend with the global register allocator. Not suprisingly, it
can make a substantial impact over the local allocator: it generates code
that is about twice as fast as LLC for programs like 254.gap and
256.bzip2.

For me it looks like following.

----------------------------------------------------
1. Programs/External:

a) CBE code is already comparable with GCC code
    (some tests are slower, but some quicker.)
b) LLC code is still rather slower then GCC code

This is about right. With the CBE, we are *consistently* faster on
179.art (a 2-2.5x speedup), 252.eon (~20% speedup), 255.vortex (~15%
speedup), and 130.li (~20% speedup). Some of the other benchmarks we lag
behind, others are extremely noisy.

LLC generates code that is generally pretty slow compared to the CBE on
X86. This is largely due to lack of global register allocator for
floating point (even with linear scan), and some of the other issues
described here:

http://mail.cs.uiuc.edu/pipermail/llvmdev/2004-April/001020.html

2. Programs/MultiSource

a) CBE code is already rather quicker then GCC code
    some tests are still (moderate) slower,
    but some are much quicker (up to 5 times).
b) LLC code is still rather slower then GCC code.
    However some tests show up to 5 times speed up

Be careful comparing these numbers. I see that we have a 23x speedup
today over GCC on the "burg" test, but we go from 0.093 -> 0.004s. :slight_smile:
The shorter the test runs get, the more noisy they get, so unfortunately
we're not getting a realistic 23x speedup here. :wink:

That said, there are quite a few 20%, 40%, and even an 85% speedup here.

3. Programs/SingleSource
a) CBE code is rather quicker then GCC code
    some tests are still (moderate) slower,
    but some are much quicker (up to 6 times).

These are even more dubious. In particular, only the first 6 rows contain
programs with reasonable runtimes. This means that the 7x speedups for
going from 0.021 -> 0.003 don't really count. :slight_smile:

That said, we are still getting a 1.88x an 2.32x speedup on the int/fp
drystones and a 1.82x speedup on whetstone.

Overall impression:
1) CBE code is already rather quicker then GCC
2) LLC code is rather slower then CBE, but comparable to GCC

LLC code is only really comparable on testcases where the LLVM optimizer
is doing really good things, such as C++ programs. Right now with the
linear scan allocator on the X86, I would say that LLC generates is
20->50% slower code than the C backend.

BTW, guys, why not to focus more attention on slow
tests like: UnitTests/2002-10-09-ArrayResolution?

This one is just noise, if you look today it's 1.0's straight across the
board. Also note that the test runs for 0.003 seconds, which is the
resolution of the time command on the system the program is being run on:
this is not a good test for checking performance. :slight_smile:

SPEC/CFP2000/179.art/179.art or

Hrm, you're not happy with the 2.25x speedup we get now? With GCC it
takes 9.639s to run the test, with LLVM-CBE it takes 4.3s, and with
LLVM-LLC-LS it takes 4.963s. I think these are pretty good numbers. :slight_smile:

or maybe it is already under hard work? :slight_smile:

Actually we spend *VERY* little time tuning and tweaking the optimizer for
performance. Something that would be *INCREDIBLY* useful would be for
someone to pick some benchmark or other program we do poorly on (e.g.
Ptrdist-ks), and find out *WHAT* we could be doing to improve it. A good
way to do this is to take the program, run it in a profiler (llvm-prof or
gprof) find the hot spots, and see what we're code generating for them,
and suggest ways that it could be improved. If something performs well
with the CBE but not with LLC-LS, then compare the native machine code
generated, if it performs poorly with both, then it's probably an LLVM
optimization.

At this point there are a huge number of possibilities for improvement.
We have very little in the way of loop optimizations, and we don't
actually use an interprocedural pointer analysis (I'm hope to rectify this
for 1.3, it should make a huge difference). Even if you're not into
hacking on LLVM optimizations, identifying code that we could improve
(and reducing them down to small examples of code we compile poorly) is
incredibly useful.

Consider this a small plea for help. :slight_smile: Once we know what to fix, it's
usually pretty easy to do so, but identifying the problems takes time, and
we have plenty of other things we need to be doing as well.

-Chris

Be careful comparing these numbers. I see that we have a 23x speedup
today over GCC on the "burg" test, but we go from 0.093 -> 0.004s. :slight_smile:
The shorter the test runs get, the more noisy they get, so unfortunately
we're not getting a realistic 23x speedup here. :wink:
[...]
These are even more dubious. In particular, only the first 6 rows contain
programs with reasonable runtimes. This means that the 7x speedups for
going from 0.021 -> 0.003 don't really count. :slight_smile:
[...]
This one is just noise, if you look today it's 1.0's straight across the
board. Also note that the test runs for 0.003 seconds, which is the
resolution of the time command on the system the program is being run on:
this is not a good test for checking performance. :slight_smile:

yes, i saw, that the numbers are quite discrete, it is just another
feature telling the same. Well, but who really needs insecure numbers?..
maybe it would be a good idea to change the test a bit or make the
same test but in a loop? otherwize it makes no sense to include them
into benchmarking...

Hrm, you're not happy with the 2.25x speedup we get now?

opss, sorry, my mistake.

With GCC it
takes 9.639s to run the test, with LLVM-CBE it takes 4.3s, and with
LLVM-LLC-LS it takes 4.963s. I think these are pretty good numbers. :slight_smile:

oh, you are right, i like those numbers :slight_smile:

Actually we spend *VERY* little time tuning and tweaking the optimizer for
performance. Something that would be *INCREDIBLY* useful would be for
someone to pick some benchmark or other program we do poorly on (e.g.
Ptrdist-ks), and find out *WHAT* we could be doing to improve it. A good
way to do this is to take the program, run it in a profiler (llvm-prof or
gprof) find the hot spots, and see what we're code generating for them,
and suggest ways that it could be improved. If something performs well
with the CBE but not with LLC-LS, then compare the native machine code
generated, if it performs poorly with both, then it's probably an LLVM
optimization.

At this point there are a huge number of possibilities for improvement.
We have very little in the way of loop optimizations, and we don't
actually use an interprocedural pointer analysis (I'm hope to rectify this
for 1.3, it should make a huge difference). Even if you're not into
hacking on LLVM optimizations, identifying code that we could improve
(and reducing them down to small examples of code we compile poorly) is
incredibly useful.

Consider this a small plea for help. :slight_smile: Once we know what to fix, it's
usually pretty easy to do so, but identifying the problems takes time, and
we have plenty of other things we need to be doing as well.

well, 90+ % of my spare time is dedicated to SuSE 9 and win32/cygwin.
Neither of two is good for LLVM today. But me and others always
have a web client! Would it be a big deal to add small functionality
to the http://llvm.cs.uiuc.edu/demo/ page? I mean, why not to add
output in assembler there? Then everyone could try to help you from
anywhere. If i remember right, Misha was the one who prepared
this nice page, could we ask Misha for that great add-on?

with best regards,
Valery.

> This one is just noise, if you look today it's 1.0's straight across the
> board. Also note that the test runs for 0.003 seconds, which is the
> resolution of the time command on the system the program is being run on:
> this is not a good test for checking performance. :slight_smile:

yes, i saw, that the numbers are quite discrete, it is just another
feature telling the same. Well, but who really needs insecure numbers?..
maybe it would be a good idea to change the test a bit or make the
same test but in a loop? otherwize it makes no sense to include them
into benchmarking...

The nightly tester is used for two purposes: making sure that nothing
breaks (the unit tests) and keeping tabs on how well performance is doing
(the spec and most multisource tests). It's not a reliable way to do
serious benchmarking, but can give good insights into where things can be
improved.

> With GCC it
> takes 9.639s to run the test, with LLVM-CBE it takes 4.3s, and with
> LLVM-LLC-LS it takes 4.963s. I think these are pretty good numbers. :slight_smile:

oh, you are right, i like those numbers :slight_smile:

:slight_smile:

> Consider this a small plea for help. :slight_smile: Once we know what to fix, it's
> usually pretty easy to do so, but identifying the problems takes time, and
> we have plenty of other things we need to be doing as well.

well, 90+ % of my spare time is dedicated to SuSE 9 and win32/cygwin.
Neither of two is good for LLVM today.

What is wrong with SuSE 9? Is it the wierd GCC ICE bug? If so, I believe
that it only effects one .cpp file in the LLVM sources and can probably be
worked around. I don't have a copy of the buggy compiler handy, but if
you were to look into it it would probably be a pretty easy problem to
solve.

But me and others always have a web client! Would it be a big deal to
add small functionality to the http://llvm.cs.uiuc.edu/demo/ page? I
mean, why not to add output in assembler there? Then everyone could try
to help you from anywhere. If i remember right, Misha was the one who
prepared this nice page, could we ask Misha for that great add-on?

It would certainly be possible, but the point of the demo page is to show
how the C front-end and optimizer transform programs to LLVM code. I
might be able to con Brian or Misha into adding support for native code
generation, but we'll have to see (they're all really busy with
end-of-semester stuff).

-Chris

The nightly tester is used for two purposes: making sure that nothing
breaks (the unit tests) and keeping tabs on how well performance is doing
(the spec and most multisource tests). It's not a reliable way to do
serious benchmarking, but can give good insights into where things can be
improved.

hm, one day, the great benchmarking will be a reason to use LLVM,
so, i think a bit more benchmarking info could be fruitful.
If test might be enveloped in a loop to get run about few
seconds then it makes hardly nightly testings much slower,
but will bring a lot of interesting info.

What is wrong with SuSE 9? Is it the wierd GCC ICE bug? If so, I believe
that it only effects one .cpp file in the LLVM sources and can probably be
worked around. I don't have a copy of the buggy compiler handy, but if
you were to look into it it would probably be a pretty easy problem to
solve.

yesterday I got new SuSE 9.1 DVD, so i am going to enter this
river again. Perhaps, this time all will be fine.

It would certainly be possible, but the point of the demo page is to show
how the C front-end and optimizer transform programs to LLVM code. I
might be able to con Brian or Misha into adding support for native code
generation, but we'll have to see (they're all really busy with
end-of-semester stuff).

Such an add-on would be really nice.
BTW, my good wishes to them concerning end-of-semester.

> The nightly tester is used for two purposes: making sure that nothing
> breaks (the unit tests) and keeping tabs on how well performance is doing
> (the spec and most multisource tests). It's not a reliable way to do
> serious benchmarking, but can give good insights into where things can be
> improved.

hm, one day, the great benchmarking will be a reason to use LLVM,
so, i think a bit more benchmarking info could be fruitful.
If test might be enveloped in a loop to get run about few
seconds then it makes hardly nightly testings much slower,
but will bring a lot of interesting info.

Definitely, it just takes time to do all of these things. Besides, most
of the unit tests are entirely artificial and not-interesting for
benchmarking purposes. Here are some examples:

http://llvm.cs.uiuc.edu/cvsweb/cvsweb.cgi/llvm/test/Programs/SingleSource/UnitTests/2003-05-31-CastToBool.c?rev=1.3&content-type=text/x-cvsweb-markup
http://llvm.cs.uiuc.edu/cvsweb/cvsweb.cgi/llvm/test/Programs/SingleSource/UnitTests/2004-02-02-NegativeZero.c?rev=1.1&content-type=text/x-cvsweb-markup

Doing more performance tuning is certainly important though, it's just
that other projects have taken priority. There are *so* many good things
that can be done in a compiler, especially in one as multifaceted as LLVM.
:slight_smile:

> What is wrong with SuSE 9? Is it the wierd GCC ICE bug? If so, I believe
> that it only effects one .cpp file in the LLVM sources and can probably be
> worked around. I don't have a copy of the buggy compiler handy, but if
> you were to look into it it would probably be a pretty easy problem to
> solve.

yesterday I got new SuSE 9.1 DVD, so i am going to enter this
river again. Perhaps, this time all will be fine.

Sounds great, please let me know how it goes.

> It would certainly be possible, but the point of the demo page is to show
> how the C front-end and optimizer transform programs to LLVM code. I
> might be able to con Brian or Misha into adding support for native code
> generation, but we'll have to see (they're all really busy with
> end-of-semester stuff).

Such an add-on would be really nice.

Yes, it would. On the other hand, we also want to get people looking at
the LLVM code as well: it's really easy to understand and follow once you
get the hang of it, much moreso than machine code.

-Chris

> yesterday I got new SuSE 9.1 DVD, so i am going to enter this
> river again. Perhaps, this time all will be fine.

Sounds great, please let me know how it goes.

SuSE 9.1 is running OK.
after 30 minute of compilation i get first errors:

Please apply this patch to your tree:

Index: Makefile

This is fixed in CVS. You might consider upgrading to LLVM CVS,
especially if you are interested in looking at performance issues: several
important performance related patches have gone in since LLVM 1.2.

OK, now i've really upgraded to CVS, but it looks like
state of sources is "not compilable" :frowning:

indeed:

> This is fixed in CVS. You might consider upgrading to LLVM CVS,
> especially if you are interested in looking at performance issues: several
> important performance related patches have gone in since LLVM 1.2.

OK, now i've really upgraded to CVS, but it looks like
state of sources is "not compilable" :frowning:

indeed:
***************************
Linking llc release executable
/pool/tmp/ssrc/llvm/lib/Release/sparcv9.o(.text+0x2e343): In function `_GLOBAL__I__ZN4llvm16SparcV9SchedInfoC2ERKNS_13TargetMachineE':
: undefined reference to `llvm::CPUResource::CPUResource[in-charge](std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'
[...]
***************************

Ugh, that's been fixed. It looks like a partial checkin got in. :frowning:

Really, file ./include/llvm/Target/TargetSchedInfo.h has
just declaration of constructor "CPUResource".
I can't find with "find-grep" the constructor definition.

so, i've changed:
  CPUResource(const std::string& resourceName, int maxUsers)
to:

  CPUResource(const std::string& resourceName, int maxUsers)
    : rname(resourceName), maxNumUsers(maxUsers) {};

That's fine. :slight_smile: When/if you update your tree again, you'll get an
out-of-line definition in lib/Target/TargetSchedInfo.cpp, so you'll want
to remove the one in the header, but that definition should work.

Another thing is:
********************
[...]
Flexing Lexer.l
Lexer.l:31:27: StackerParser.h: No such file or directory
Bisoning StackerParser.y
[...]
********************
which did not stop compilation though (probably sub-call make -k)

That's a spurious error caused by the dependency mechanism, and can be
ignored.

And now, compilation is finished, wow!
:slight_smile:

Great! Welcome to the world of LLVM. :slight_smile:

-Chris