Hi, I have a question about x86 code quality.
I have run a few benchmarks and compared the
running time of executables created by LLVM to
executables created by gcc.
It appears that code generated by LLVM is x1.5 - x3
times slower than code generated by gcc, for the x86
For some of the benchmarks the linear scan regalloc
works. When it does, results are in the x1.0 - 1.5
range. Unfortunately, the linear scan allocator breaks
on most of my code.
1) Do my observations fit your general experience ?
Yes, that does. I assume you are working with LLVM 1.2?
I haven't looked into the details of the generated
x86 code. I have the following observation, though:
When using gcc as a backend (compiling to the 'c' target
and then recompiling with gcc) results are generally a lot
better than just using the LLVM->x86 backend. This
indicates that the performance difference is mostly
located to the LLVM->x86 backend. Further, for those
of my codes where the new allocator works, results are
much better. Whether this is due to the allocator, or
some interaction between it and cogen, I do not know.
The LLVM 1.2 X86 code quality problems are due to a couple of serious
1. The default register allocator is a purely local algorithm, which
cannot hold (e.g.) the counter of a loop in a register across the loop.
This is *clearly* bad, and switching to the new allocator obviously
makes a big difference
2. Even with the new allocator, we are not able to globally allocate
floating point registers (yet), do to some interaction with the X86
floating point stack. This is just something that needs to be worked
on, but unfortunately noone has had time to do the work recently.
3. When compiling with the native X86 backend, very little additional
optimization is performed. When compiling with the C backend & GCC,
GCC does it's own optimizations that can make a big difference. For
example, LLVM 1.2 could only index into arrays with 64-bit integers
(the getelementptr only accepted a 'long' operand). This could cause
huge performance problems on the X86, which the GCC optimizer happily
stomped out. (this issue has been fixed in LLVM CVS:
4. in LLVM 1.2, several LLVM->LLVM optimizations were doing very obviously
silly things, and have subsequently been fixed. See the "1.3" release
notes for information: http://llvm.cs.uiuc.edu/docs/ReleaseNotes.html
5. One of our goals for LLVM 1.3 is to get one of the scalable pointer
analyses that I have been working on turned on by default in the
optimizing linker. This should have a pretty noticable performance
Currently, I am just playing with LLVM, but the longterm
plan is to build a new backend for a new machine. It won't
be register starved as the x86 is.
Of the above, #1 would directly effect your target, #2 is X86 specific, #3
would have affected your target if it's 32-bit or smaller, #4 would have
hurt your target, and #5 will almost certainly help your target.
2) Is there a similar performance differential between
LLVM->sparc and gcc on sparc, or are they much closer
because the sparc has more registers and thus should
be less dependent on good register allocation ?
I truly have no idea. I don't use the Sparc target very much, and I don't
know if anyone has looked into the actual performance of it. One of the
problems is that the LLVM Sparc backend doesn't share much code with the
target-independent code generator, so it's very hard to compare. Our
long-term goal is to merge the sparc code generator into the
target-independent code paths.
3) What is the expected timeframe for the new regalloc to
become stable ?
I am hoping/planning for the new allocator to be in LLVM 1.3 as the
default allocator. From what I understand there is one bug left related
to spill code insertion, but Alkis has been very busy with other projects
(it's nearing the end of the semester already :). If he doesn't get to
it by 1.3, I will.
.. or perhaps I should make a more general
question: what is the perceived status in terms of performance
for the two compiler backends and for the compiler backend
part of the infrastructure ?
At this point we haven't actually spent a lot of time evaluating and
measuring code quality. In fact if you notice a piece of code that is not
being optimized or code generated well, please file a bug (with a
suggestion on what the code should have been compiled to). Generally we
separate optimizations in the catagories of LLVM->LLVM or codegen
optimizations, but both are important.
Finally I think LLVM looks *very* nice and appears to be a substantial
contribution to the world of open source compiler infrastructure.
Thanks! If you have any more questions, please feel free to ask.