A faster instruction selector?

Hi everyone,

llvm is great!

But there is one exception :wink:
llvm components are generally fast, but instruction selection is slooow.

Let me explain.
I am developing a toolkit for building virtual machines which can automatically
generate a JIT compiler using the interpreter specification.
llvm does the hard work of machine code generation. (Thanks to you all)

I discovered that JIT compilation is taking much longer than had hoped.
Some profiling showed that is almost entirely due
to the llvm instruction selector (The register allocator and optimisation passes are fast)

The split in execution time for JIT compilation being approximately:
(My) IR code generation: 1
(llvm) Register allocation: 1
(llvm) Instruction selection: 12

The ladyvm JVM paper noted the same problem:
http://vmkit.llvm.org/ladyvm.html

So I did a quick experiment with llvm and lcc (lcc, A Retargetable Compiler for ANSI C),
in order to see how much faster instruction selection could be.
lcc uses a BURG-type (called lburg) instruction-selector.

The following is for x86/linux (ubuntu)
I am interested in JIT performance so I have only counted user+sys time.

Compiling a small test program (the dhrystone benchmark):
lcc (Fraser and Hanson) (after pre-processing): 4ms.
llc : 16ms.
Incidentally, optimisation is respectably fast
opt -O3 : 16ms.

(My machine is quite slow)

Using -time-passes shows that almost all of the time spent by llc is in the DAG-to-DAG
instruction selection.

lcc does lexing, parsing, type-checking, IR code-generation, register allocation
AND final code generation in much less time that llvm spends doing instruction selection.
Also, there is no way that instruction selection should take as long as -O3 optimisation.

It would seem that lcc's lburg instruction selector is at least ten times faster than
than llvm, possibly 20 times faster.

Does a fast instruction selector for llvm exist?

A BURG based instruction selector would improve the JIT compilation capabilities of llvm hugely,
and benefit the static compiler as well.
A fast compiler is really useful in the compile-test-debug cycle.

Sadly, I do not have the time to implement such an instruction selector myself.

Cheers,
Mark.

Hi Mark,

I don't know the internals of the instruction selection in LLVM (vburg and stuff), but I do know that work has been made to improve it. The "-fast" command line argument in lli and llc generates less optimized machine code quickly. You can also use the local register allocator which is faster than the default linear scan.

I did some benchmarks in vmkit (the project that encompass ladyvm) and compilation time is roughly 50% faster with both fast and local register allocator.

Nicolas

Mark Shannon wrote:

I discovered that JIT compilation is taking much longer than had hoped.
Some profiling showed that is almost entirely due
to the llvm instruction selector (The register allocator and optimisation passes are fast)

The split in execution time for JIT compilation being approximately:
(My) IR code generation: 1
(llvm) Register allocation: 1
(llvm) Instruction selection: 12

Interesting. What version of LLVM are you using here?

The ladyvm JVM paper noted the same problem:
http://vmkit.llvm.org/ladyvm.html

So I did a quick experiment with llvm and lcc (lcc, A Retargetable Compiler for ANSI C),
in order to see how much faster instruction selection could be.
lcc uses a BURG-type (called lburg) instruction-selector.

The following is for x86/linux (ubuntu)
I am interested in JIT performance so I have only counted user+sys time.

Compiling a small test program (the dhrystone benchmark):
lcc (Fraser and Hanson) (after pre-processing): 4ms.
llc : 16ms.
Incidentally, optimisation is respectably fast
opt -O3 : 16ms.

(My machine is quite slow)

Using -time-passes shows that almost all of the time spent by llc is in the DAG-to-DAG
instruction selection.

lcc does lexing, parsing, type-checking, IR code-generation, register allocation
AND final code generation in much less time that llvm spends doing instruction selection.
Also, there is no way that instruction selection should take as long as -O3 optimisation.

Note that these proportions can vary greatly depending on the nature of the
input, and dhrystone is a very small amount of code.

It would seem that lcc's lburg instruction selector is at least ten times faster than
than llvm, possibly 20 times faster.

Does a fast instruction selector for llvm exist?

The -fast option directs llc to generate code quickly, at the expense
of the quality of the generated code. If your JIT is designed such that
JIT compilation time is the dominating factor, this can be a very
valuable tradeoff. A lot of work was done in this area for
LLVM 2.4.

A BURG based instruction selector would improve the JIT compilation capabilities of llvm hugely,
and benefit the static compiler as well.

This may be true, though in the data I've seen currently the selection
algorithm itself isn't the most significant bottleneck.

FWIW, what -time-passes calls "instruction selection" includes a lot more
work than what its name suggests. A fair amount of pre-selection optimization
and lowering happens during this phase, as well as scheduling.

A fast compiler is really useful in the compile-test-debug cycle.

Yep.

Sadly, I do not have the time to implement such an instruction selector myself.

Do you have time to do more detailed profiling? It might be interesting
to see which parts of codegen are hot in your use cases.

Dan