Parsing benchmark: LibTomMath

I did a little parsing benchmark with Clang.

You can download LibTomMath 0.41 from http://libtom.org/

Intel(R) Core(TM)2 CPU T7200
LLVM and Clang SVN trunk r44196, optimized build
LibTomMath 0.41
Debian GNU/Linux Sid up-to-date 2007-11-17
GCC (Debian 4.2.2-3)

Result:

tinuviel@debian:~/src$ tar jxf ltm-0.41.tar.bz2
tinuviel@debian:~/src$ cd libtommath-0.41

tinuviel@debian:~/src/libtommath-0.41$ time gcc -fsyntax-only -Wall -I. *.c
real 0m3.174s
user 0m2.820s
sys 0m0.360s
tinuviel@debian:~/src/libtommath-0.41$ time clang -fsyntax-only -I. *.c
real 0m1.015s
user 0m1.004s
sys 0m0.012s

tinuviel@debian:~/src/libtommath-0.41$ cat *.c > all.c

tinuviel@debian:~/src/libtommath-0.41$ time gcc -fsyntax-only -I. all.c
real 0m0.118s
user 0m0.112s
sys 0m0.008s
tinuviel@debian:~/src/libtommath-0.41$ time clang -fsyntax-only -I. all.c
real 0m0.024s
user 0m0.020s
sys 0m0.004s

Sanghyeon Seo wrote:-

I did a little parsing benchmark with Clang.

You can download LibTomMath 0.41 from http://libtom.org/

Intel(R) Core(TM)2 CPU T7200
LLVM and Clang SVN trunk r44196, optimized build
LibTomMath 0.41
Debian GNU/Linux Sid up-to-date 2007-11-17
GCC (Debian 4.2.2-3)

Result:

tinuviel@debian:~/src$ tar jxf ltm-0.41.tar.bz2
tinuviel@debian:~/src$ cd libtommath-0.41

tinuviel@debian:~/src/libtommath-0.41$ time gcc -fsyntax-only -Wall -I. *.c
real 0m3.174s
user 0m2.820s
sys 0m0.360s
tinuviel@debian:~/src/libtommath-0.41$ time clang -fsyntax-only -I. *.c
real 0m1.015s
user 0m1.004s
sys 0m0.012s

tinuviel@debian:~/src/libtommath-0.41$ cat *.c > all.c

tinuviel@debian:~/src/libtommath-0.41$ time gcc -fsyntax-only -I. all.c
real 0m0.118s
user 0m0.112s
sys 0m0.008s
tinuviel@debian:~/src/libtommath-0.41$ time clang -fsyntax-only -I. all.c
real 0m0.024s
user 0m0.020s
sys 0m0.004s

Are there any reasons this may not be a fair comparison?

Neil.

It would be better to run the gcc test 3 times then the clang test 3 times and take the minimum of each. That said, we should be significantly faster... 3x is awesome!

-Chris

http://nondot.org/sabre
http://llvm.org

I did minimum of 3. I just forgot to mention that.

Chris Lattner wrote:-

It would be better to run the gcc test 3 times then the clang test 3
times and take the minimum of each. That said, we should be
significantly faster... 3x is awesome!

I don't want to spoil the party :), but the comparison is not really
fair.

cc1 should be being invoked to avoid an extra fork/exec that
penalizes GCC. GCC handles character sets to some extent, which
is a non-trivial cost, which clang does not. Clang doesn't implement
all the language yet; the extra semantic checking will likely slow
it down slightly. I believe GCC does lowering of some kind in
addition to semantic analysis with -fsyntax-only, but I guess that
can be viewed as an imlpementation weakness.

Of course, once complete, clang will probably be further improved
speed-wise, and having said all that I'm sure clang will end up much
faster than GCC.

As another data point, here are timings on my (slow) machine,
fastest of three runs, for compiling the preprocessor files of my
own C front end. Like clang, cfe is stand-alone, and doesn't handle
charsets, so to that extent it is a fairer comparison. Like clang
there hasn't really been any time spent optimizing for speed.
However unlike clang it does semantically analyze and diagnose
pretty much the whole language. clang/llvm was compiled with gmake
MAKE_OPTIMIZED=1 -- I believe that's how to get an optimized
executable.

$ time for file in cpp/*.c; do /usr/libexec/cc1 -quiet -fsyntax-only -I.
$file ;done

real 0m1.098s
user 0m0.843s
sys 0m0.190s

$ time for file in cpp/*.c; do ~/src/nobackup/llvm/Release/bin/clang
-fsyntax-only -I. $file ;done
cpp/macro.c:686:15: error: variable has incomplete type 'char '
  static char compile_date = "\"Mmm dd yyyy\"";
              ^
cpp/macro.c:687:15: error: variable has incomplete type 'char '
  static char compile_time = "\"hh:mm:ss\"";
              ^
2 diagnostics generated.

real 0m0.456s
user 0m0.281s
sys 0m0.147s

$ time for file in cpp/*.c; do ~/src/cfe/cfe -I. $file ;done

real 0m0.205s
user 0m0.118s
sys 0m0.083s

So there is still room for improvement, curing the C++ bloat and all
that. 8)

Neil.

Do you think it'll make a suitable compiler codegen benchmark? I'd like to add it to llvm-test test suite.

Thanks,

Evan

Chris Lattner wrote:-

It would be better to run the gcc test 3 times then the clang test 3
times and take the minimum of each. That said, we should be
significantly faster... 3x is awesome!

I don't want to spoil the party :), but the comparison is not really
fair.

cc1 should be being invoked to avoid an extra fork/exec that
penalizes GCC. GCC handles character sets to some extent, which
is a non-trivial cost, which clang does not. Clang doesn't implement
all the language yet; the extra semantic checking will likely slow
it down slightly. I believe GCC does lowering of some kind in
addition to semantic analysis with -fsyntax-only, but I guess that
can be viewed as an imlpementation weakness.

While everything you say is accurate, I don't think these implementation level issues should be "over analyzed" (particularly at this point in the project). For example, most developers don't invoke cc1 directly (so doing so in a benchmark seems like cheating).

While you can pick apart any benchmark, front-end performance is something that needs to be tracked early and often. Compile-time performance wasn't a gcc goal for many years (which led to some of gcc's bloat and compile-time woes).

I'd like to emphasize something (my own form of over analysis:-)...

Unlike gcc, clang is being developed as a set of reusable components (with the goal of supporting a diverse set of needs). From my perspective, striking the right balance between abstraction and performance is an "art". It's hard to do, and hasn't been a part of the C compiler development culture over the years (making it difficult to find people that respect/understand this idiom).

That said, the fact that clang is very competitive with gcc on "traditional" batch mode processing is very encouraging.

snaroff

Use ENABLE_OPTIMIZED=1 to get optimized executable.

Neil, you can also configure with --enable-optimized in order to avoid having to specify that to make every(?) time. Also, an 'svn export' will build optimized by default. (configure checks whether the .svn or CVS directory is present.)

— Gordon

Chris Lattner wrote:-

It would be better to run the gcc test 3 times then the clang test 3
times and take the minimum of each. That said, we should be
significantly faster... 3x is awesome!

I don't want to spoil the party :), but the comparison is not really
fair.

Sure, but it's a good indicator.

cc1 should be being invoked to avoid an extra fork/exec that
penalizes GCC.

Why? On darwin, "gcc" is actually a driver-driver that invokes another "gcc" which is a driver, which invokes "cc1". It doesn't matter why the architecture is like this, the actual effect for the user is that all this is done for every compile.

GCC handles character sets to some extent, which
is a non-trivial cost, which clang does not. Clang doesn't implement
all the language yet; the extra semantic checking will likely slow
it down slightly.

Yep, this is true.

I believe GCC does lowering of some kind in
addition to semantic analysis with -fsyntax-only, but I guess that
can be viewed as an imlpementation weakness.

Yep.

Of course, once complete, clang will probably be further improved
speed-wise, and having said all that I'm sure clang will end up much
faster than GCC.

Of course. :slight_smile:

As another data point, here are timings on my (slow) machine,
fastest of three runs, for compiling the preprocessor files of my
own C front end. Like clang, cfe is stand-alone, and doesn't handle
charsets, so to that extent it is a fairer comparison. Like clang
there hasn't really been any time spent optimizing for speed.
However unlike clang it does semantically analyze and diagnose
pretty much the whole language. clang/llvm was compiled with gmake
MAKE_OPTIMIZED=1 -- I believe that's how to get an optimized
executable.

Cool.

$ time for file in cpp/*.c; do /usr/libexec/cc1 -quiet -fsyntax-only -I.
$file ;done

real 0m1.098s
user 0m0.843s
sys 0m0.190s

$ time for file in cpp/*.c; do ~/src/nobackup/llvm/Release/bin/clang
-fsyntax-only -I. $file ;done
cpp/macro.c:686:15: error: variable has incomplete type 'char '
static char compile_date = "\"Mmm dd yyyy\"";
             ^
cpp/macro.c:687:15: error: variable has incomplete type 'char '
static char compile_time = "\"hh:mm:ss\"";
             ^
2 diagnostics generated.

real 0m0.456s
user 0m0.281s
sys 0m0.147s

$ time for file in cpp/*.c; do ~/src/cfe/cfe -I. $file ;done

real 0m0.205s
user 0m0.118s
sys 0m0.083s

So there is still room for improvement, curing the C++ bloat and all
that. 8)

Wow, you have significantly less system time as well as user time. Off hand, I am not sure why you would have significantly less system time (do you have a trick that you are using?) other than the fact that we build in several system -I search paths that are probably being queried but ignored.

For user time, it is hard to guess what the difference is. Is this source code available to use for timings? I'd be interested to see where clang is spending all its time: your cfe is almost 2x faster, which points to us doing something pretty braindead :slight_smile:

-Chris

For user time, it is hard to guess what the difference is. Is this
source code available to use for timings? I'd be interested to see
where clang is spending all its time: your cfe is almost 2x faster,
which points to us doing something pretty braindead :slight_smile:

Chris,

I think it would be great to learn from cfe, but it needs to be in harmony with our goals.

Without knowing the architecture of Neil's cfe, it's really hard to say we're doing anything braindead (though it's always great to find "low hanging fruit"-:). For example, the "Tiny C Compiler" is reported to compile/assemble/link 9x faster than GCC. While this is great, it's clearly not an architecture that is interesting to us. clang is the most highly "layered" front-end I've ever worked on (clang is the 5th C-based front-end I've had the pleasure to work on). While layering opens up a world of possibilities, it has a cost.

snaroff

Don't worry Steve, I agree with you :). In my travels, I've found that if you have the right architecture that proper layering doesn't impose a huge performance penalty. I believe that we can be very performant and still have the nice architecture we desire.

Whether I'm right or wrong, the place to start is to find out where the time is going. My guess is that the overhead doesn't have anything to do with the layering. If it does, then we can either figure out a way to make the layering be less costly, or accept it as the cost we pay for a nice architecture.

-Chris

Devang Patel wrote:-

>clang/llvm was compiled with gmake
>MAKE_OPTIMIZED=1 -- I believe that's how to get an optimized
>executable.

Use ENABLE_OPTIMIZED=1 to get optimized executable.

Apologies, that's what I did, I just emailed it wrong.

Neil.

Steve Naroff wrote:-

I'd like to emphasize something (my own form of over analysis:-)...

Unlike gcc, clang is being developed as a set of reusable components
(with the goal of supporting a diverse set of needs). From my
perspective, striking the right balance between abstraction and
performance is an "art". It's hard to do, and hasn't been a part of
the C compiler development culture over the years (making it difficult
to find people that respect/understand this idiom).

I think you're absolutely right; the library / interface idiom is
the right approach and will have huge payoff in many directions. I
prefer the architecture of clang to my own front end for this reason,
and am attempting to reorganize it in a similar direction.

I agree with Chris and doubt the abstraction costs much either; I'd
be surprised if it exceeded 10% or so. I think NetBSD is a good example
(in C) of how abstracted code can be cleaner and just as efficient.

I have no idea why clang is slower than cfe; the gap surprised me too.
I'm not doing anything magic; it's just straight-forward C code like
APFloat would look like if you removed the class syntactic sugar. I
build a fairly complete internal representation; there's no corner
being cut.

Neil.

Steve Naroff wrote:-

I'd like to emphasize something (my own form of over analysis:-)...

Unlike gcc, clang is being developed as a set of reusable components
(with the goal of supporting a diverse set of needs). From my
perspective, striking the right balance between abstraction and
performance is an "art". It's hard to do, and hasn't been a part of
the C compiler development culture over the years (making it difficult
to find people that respect/understand this idiom).

I think you're absolutely right; the library / interface idiom is
the right approach and will have huge payoff in many directions. I
prefer the architecture of clang to my own front end for this reason,
and am attempting to reorganize it in a similar direction.

Interesting (and good to hear).

I agree with Chris and doubt the abstraction costs much either; I'd
be surprised if it exceeded 10% or so. I think NetBSD is a good example
(in C) of how abstracted code can be cleaner and just as efficient.

I also agree with Chris - I certainly wasn't trying to say the abstraction penalty is our problem.

My only point was it's always going to be hard to do an "apple to apple" comparison. I brought up "Tiny C" as a way to dramatize my point.

I think Chris said it best when he characterized the data as a "good indicator" (of performance).

I have no idea why clang is slower than cfe; the gap surprised me too.
I'm not doing anything magic; it's just straight-forward C code like
APFloat would look like if you removed the class syntactic sugar. I
build a fairly complete internal representation; there's no corner
being cut.

For "header rich" projects (like those @ Apple), we spend most of our time preprocessing and malloc'ing (building the AST's).

The breakdown is roughly, 60/30/10 (preprocessor, AST building/semantic analysis, parsing).

I am not familiar with LibTomMath - I would guess it isn't header rich.

If your cfe has a "-E" switch, it might be interesting to see how it compares with clang. If there isn't a big difference in preprocessor performance, we know it is likely to be in the AST building and semantic analysis.

Curious,

snaroff

Steve Naroff wrote:-

My only point was it's always going to be hard to do an "apple to
apple" comparison. I brought up "Tiny C" as a way to dramatize my point.

As you note, TCC is quite uninteresting at it makes essentially no
attempt to conform to the standard, and doing so would require a
major rewrite and slow it down considerably.

I am not familiar with LibTomMath - I would guess it isn't header rich.

If your cfe has a "-E" switch, it might be interesting to see how it
compares with clang. If there isn't a big difference in preprocessor
performance, we know it is likely to be in the AST building and
semantic analysis.

Heh, embarassingly it seems to have "regressed" in the -E department,
with a segfault or two :slight_smile: Notably the only area that isn't regtested.

I'm doubtful of the value of the comparison anyway, though, as
e.g. clang takes care to avoid pastes, I do not, and I take care to
preserve the form of whitespace, whereas clang does not. I've always
felt -E timings are more representive of buffering strategy than
anything else.

Neil.