The release of a new code generator in Mono 2.2 prompted me to benchmark the
performance of various VMs using the SciMark2 benchmark on an 8x 2.1GHz
64-bit Opteron and I have published the results here:
Mono was up to 12x slower than LLVM before and is now only 2.2x slower on
average. Interestingly, the JVM scores slightly higher than LLVM on this
benchmark on average and beats LLVM on two of the five individual tests.
The individual scores are particularly enlightening. Specifically:
. LLVM outperforms all other VMs by a significant margin on FFT, Monte Carlo
and sparse matrix multiply.
. LLVM is beaten by the JVM on successive over-relaxation (SOR) and LU
decomposition.
In the context of the SOR test, I suspect the JVM is using alias information
to perform optimizations that LLVM and llvm-gcc probably do not do.
I am not sure what causes the performance discrepancy on LU. Perhaps the JVM
is generating SSE instructions. Does llvm-gcc generate SSE instructions under
any circumstances?
The release of a new code generator in Mono 2.2 prompted me to benchmark the
performance of various VMs using the SciMark2 benchmark on an 8x 2.1GHz
64-bit Opteron and I have published the results here:
Mono was up to 12x slower than LLVM before and is now only 2.2x slower on
average. Interestingly, the JVM scores slightly higher than LLVM on this
benchmark on average and beats LLVM on two of the five individual tests.
The individual scores are particularly enlightening. Specifically:
. LLVM outperforms all other VMs by a significant margin on FFT, Monte Carlo
and sparse matrix multiply.
. LLVM is beaten by the JVM on successive over-relaxation (SOR) and LU
decomposition.
In the context of the SOR test, I suspect the JVM is using alias information
to perform optimizations that LLVM and llvm-gcc probably do not do.
I am not sure what causes the performance discrepancy on LU. Perhaps the JVM
is generating SSE instructions. Does llvm-gcc generate SSE instructions under
any circumstances?
interesting, but can you add plain C compiled with the good old-fashined GCC or similar to serve as a point of reference as well?...
This is not a quite fair comparison. Other virtual machines must be
doing garbage collection, while LLVM, as it is using C code, it is
taking advantage of memory allocation by hand.
Here is a run of scimark2 with verbose GC enabled. You'll see that there are two garbage collection cycles for a total of around .003 seconds of time.
It should also be noted that these GCs happened before the timer starts running. There is almost no dynamic memory allocation in this code. Modern garbage collectors
are also very efficient (sometimes better than hand deallocation).
That is an insignificant advantage in this particular case (SciMark2) because
the memory for each test is preallocated and not part of the measurement and
the heap and stack are both tiny during the computations so there is little
to traverse.
I am interested in the comparative results for LLVM because I consider it to
represent how fast my LLVM-based VM might be compared to other garbage
collected VMs.
However, LLVM has a serious disadvantage compared to the other VMs here
because it does not have aliasing assurances. For example, it does not know
about array aliasing, e.g. that the subarrays in the successive
over-relaxation test cannot overlap.
The LLVM 2.1 release notes say that llvm-gcc got alias analysis and understood
the "restrict" keyword but when I add it to the C code for SciMark2 it makes
no difference. Can anyone else get this to work?