LLVM-based JVM JIT for libgcj

I recently wrote an LLVM-based JIT plugin for libgcj and I thought
it'd be worthwhile to mention it here.

It is in cvs on sourceforge, but afaics anonymous cvs there is pretty
broken at the moment... so if you want a copy, ask and I will email it
to you.

Basically I hacked libgcj to (optionally) dynamically load JIT module
at startup. If a JIT is loaded then bytecode is passed to it rather
than to the libgcj bytecode interpreter.

The LLVM JIT is pretty raw at the moment. It can run "hello world"
and a few microbenchmarks (empty loops, method calls, that sort of
thing). I haven't tested it seriously yet. On my little benchmarks
it is 5x-6x faster than our interpreter.

Exception handling definitely does not work, I didn't even try to
implement it yet. I've been thinking about having some kind of simple
bridge between the LLVM and GCC worlds here -- very inefficient, but
at least I could get it working rather quickly. Long term I'm hoping
someone else will be solving this problem... :slight_smile:

FWIW I actually did this work twice, once for libjit and once for
LLVM. I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

Thanks for writing LLVM. It is awesome to be able to add a JIT to
libgcj this easily.

Tom

I recently wrote an LLVM-based JIT plugin for libgcj and I thought
it'd be worthwhile to mention it here.

Cool!

Exception handling definitely does not work, I didn't even try to
implement it yet. I've been thinking about having some kind of simple
bridge between the LLVM and GCC worlds here -- very inefficient, but
at least I could get it working rather quickly. Long term I'm hoping
someone else will be solving this problem... :slight_smile:

If I had to speculate, I would guess that LLVM will support dwarf-2 style zero cost exceptions in the next 2-3 months.

FWIW I actually did this work twice, once for libjit and once for
LLVM. I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

Given your experience with both, I'd be very interested in any thoughts you have on how we can make LLVM better. :slight_smile:

Thanks for writing LLVM. It is awesome to be able to add a JIT to
libgcj this easily.

:slight_smile:

-Chris

FWIW I actually did this work twice, once for libjit and once for
LLVM. I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

Given your experience with both, I'd be very interested in any
thoughts you have on how we can make LLVM better. :slight_smile:

libjit has a few advantages over LLVM in terms of the "gloss" -- how
it is packaged, JIT development using it:

* The API documentation is better.

  libjit's documentation is not perfectly complete, but for my
  purposes it was generally more complete and better organized than
  LLVM's. With LLVM I ended up reading the header files to figure
  everything out; with libjit I didn't.

  Also libjit uses texinfo... sometimes I think I'm the last remaining
  person who likes using info in Emacs, but this did make my life
  simpler, so I thought I'd mention it. (Obviously this is a
  subjective thing.. can you tell I'm defensive about it? :slight_smile:

  Not to belabor this too much, but I've always found doxygen output
  borderline unreadable... libjit also does comment extraction from
  the source for its documentation, but puts it into a more-or-less
  nicely structured context.

* libjit is a lot smaller. Of course this is both a plus and a minus
  (in the sense that small usually means things are missing).
  However, in terms of development productivity, libjit is a win here:
  a rebuild and relink of my libjit-based code takes under a minute.
  I think it takes 20 minutes or more to link my LLVM-based JIT on my
  laptop.

* Likewise, libjit installs very simply: it is a couple of shared
  libraries (one for the library and an extra one containing the C++
  API). At least with the default install, LLVM is a weird (to me)
  mix of static libraries and object files.
  llvm-config saved the day here, in terms of the Makefile
  hacking.

  I only saw today in the mail archives that there is a way to build
  LLVM as shared libraries -- I haven't tried it yet, so apologies if
  this is just my ignorance.

* One oddity with LLVM came because a BasicBlock is a Value. I passed
  it as the wrong argument to an AllocaInst constructor... oops.
  (libjit's API is much simpler ... no names for instructions, new
  instructions are implicitly linked into the current block, etc.
  This has both plusses and minuses. I did wonder how much it costs
  to have names everywhere...)

I think libjit only has one technical idea that is missing from LLVM.
In libjit you can create a new function and get a pointer to it, but
set things up so that the IR for the function is also created lazily.
As I understand it, right now in LLVM you can make the IR and lazily
compile it, but not lazily make the IR. This seems pretty handy, at
least for my situation. It also looks pretty easy to add to LLVM :slight_smile:

I don't want to get you down or anything. LLVM has many advantages
over libjit as well, which is why I chose to translate the JIT from
libjit to LLVM in the first place:

* LLVM has a friendlier license
* LLVM has a *much* more active community
* LLVM is much further ahead in every technical aspect: more ports,
  more optimizations, etc.

I hope this helps. And, thanks again.

Tom

FWIW I actually did this work twice, once for libjit and once for
LLVM. I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

> Given your experience with both, I'd be very interested in any
> thoughts you have on how we can make LLVM better. :slight_smile:

Nice writeup, thanks for taking the time to do it.

libjit has a few advantages over LLVM in terms of the "gloss" -- how
it is packaged, JIT development using it:

* The API documentation is better.

libjit's documentation is not perfectly complete, but for my
purposes it was generally more complete and better organized than
LLVM's. With LLVM I ended up reading the header files to figure
everything out; with libjit I didn't.

Also libjit uses texinfo... sometimes I think I'm the last remaining
person who likes using info in Emacs, but this did make my life
simpler, so I thought I'd mention it. (Obviously this is a
subjective thing.. can you tell I'm defensive about it? :slight_smile:

Not to belabor this too much, but I've always found doxygen output
borderline unreadable... libjit also does comment extraction from
the source for its documentation, but puts it into a more-or-less
nicely structured context.

Understood. It certainly would be nice to have a "how to use the JIT" document that is concise and targetted for this. Also, unfortunately, most of the docs for the LLVM API are still in the headers, which sucks. :frowning:

Perhaps after the release I can help improve this situation.

* libjit is a lot smaller. Of course this is both a plus and a minus
(in the sense that small usually means things are missing).
However, in terms of development productivity, libjit is a win here:
a rebuild and relink of my libjit-based code takes under a minute.
I think it takes 20 minutes or more to link my LLVM-based JIT on my
laptop.

Are you using a debug or a release build? A release build (built with make ENABLE_OPTIMIZED=1) is often 10x to 20x smaller than a debug build, and links correspondingly faster. On some machines, a release build builds *faster* than a debug build because the debug symbols are so huge. The only thing you lose with a release build is the ability to step into LLVM libraries in a debugger.

* Likewise, libjit installs very simply: it is a couple of shared
libraries (one for the library and an extra one containing the C++
API). At least with the default install, LLVM is a weird (to me)
mix of static libraries and object files.
llvm-config saved the day here, in terms of the Makefile
hacking.

Yup, go llvm-config! :slight_smile:

I only saw today in the mail archives that there is a way to build
LLVM as shared libraries -- I haven't tried it yet, so apologies if
this is just my ignorance.

I'd suggest sticking with llvm-config and not using shared libraries.

* One oddity with LLVM came because a BasicBlock is a Value. I passed
it as the wrong argument to an AllocaInst constructor... oops.
(libjit's API is much simpler ... no names for instructions, new
instructions are implicitly linked into the current block, etc.
This has both plusses and minuses. I did wonder how much it costs
to have names everywhere...)

Instruction/BB names are completely optional (you can pass in "" for everything, and everything will still work fine) but are quite handy when trying to read the LLVM code.

It would be straight-forward to add a new "easy" interface for creating LLVM instructions. Would something like this work well for you?

class InstructionCreator {
   BasicBlock *CurBB;
public:
   void setCurrentBlock(BasicBlock *);

   Value *createAdd(Value *LHS, Value *RHS, const std::string &Name = "");
   Value *createSub(Value *LHS, Value *RHS, const std::string &Name = "");
   ...
};

Given this, use would be much more implicit:

InstructionCreator IC;
IC.setBasicBlock(FalseBB);
Value *A = IC.createAdd(LHS, RHS);
Value *B = IC.createSetEQ(A, RHS);
IC.createBr(B, TrueBB, FalseBB);
IC.setBasicBlock(TrueBB);
...

if so, I can add this. Do you have a suggestion for a name better than "InstructionCreator"?

I think libjit only has one technical idea that is missing from LLVM.
In libjit you can create a new function and get a pointer to it, but
set things up so that the IR for the function is also created lazily.
As I understand it, right now in LLVM you can make the IR and lazily
compile it, but not lazily make the IR. This seems pretty handy, at
least for my situation. It also looks pretty easy to add to LLVM :slight_smile:

Yup, it would be great to have this. :slight_smile:

I don't want to get you down or anything.

Heh, no problem. If we couldn't admit that improvement is possible, we probably wouldn't improve. :slight_smile:

-Chris

hi Tom,

I am really glad that someone has found time to step into that :-).

Tom Tromey wrote:

I recently wrote an LLVM-based JIT plugin for libgcj and I thought
it'd be worthwhile to mention it here.

It is in cvs on sourceforge, but afaics anonymous cvs there is pretty
broken at the moment... so if you want a copy, ask and I will email it
to you.

wow. that is really nice.
I was looking for some contractor for working more on LLVM, but it is
sad that I did not succeed as far.
I would definitely like to look into it.

Basically I hacked libgcj to (optionally) dynamically load JIT module
at startup. If a JIT is loaded then bytecode is passed to it rather
than to the libgcj bytecode interpreter.

The LLVM JIT is pretty raw at the moment. It can run "hello world"
and a few microbenchmarks (empty loops, method calls, that sort of
thing). I haven't tested it seriously yet. On my little benchmarks
it is 5x-6x faster than our interpreter.

Looks promising.

Exception handling definitely does not work, I didn't even try to
implement it yet. I've been thinking about having some kind of simple
bridge between the LLVM and GCC worlds here -- very inefficient, but
at least I could get it working rather quickly. Long term I'm hoping
someone else will be solving this problem... :slight_smile:

I would offer some help. But as I have always told because of a lack of
money supporting me, I have just my very scarce free time.

FWIW I actually did this work twice, once for libjit and once for
LLVM. I'm happy to provide a comparison, from a jit-writing
perspective, if you're interested.

Yes very much! How did you find writing it directly in SSA-form. What is
the footprint of libjit compared to the more heavy LLVM jit? How about
recompiling in libjit? Has it support for CFG, dataflow analysis?

Thanks for writing LLVM. It is awesome to be able to add a JIT to
libgcj this easily.

Yes kudos to the LLVM people!

-- Jakob

I would definitely like to look into it.

I'll send it in private email.

Yes very much! How did you find writing it directly in
SSA-form.

Actually I used what Chris called "the alloca trick"... the JIT
doesn't really generate SSA form but instead uses alloca to allocate
space for the stack and locals (and temporaries where needed) and then
emits explicit loads and stores everywhere. On irc Chris showed me
how to invoke the LLVM pass to turn this back into something sane :slight_smile:

What is the footprint of libjit compared to the more heavy LLVM
jit? How about recompiling in libjit? Has it support for CFG,
dataflow analysis?

Oh, libjit is much smaller. I don't think it does much by way of
optimization.

libjit takes a very simple approach to recompilation. There is a
function you can call to request recompilation for a method. Then
(AIUI -- didn't implement this in the JIT yet) libjit will call your
build function again, to re-create the IR. The docs talk a bit about
being able to run more optimizations, but I think that is just the
general idea, since I don't think there are actually other optimizers
available.

Even this amount of support is worthwhile in a JVM, fwiw. You can
generate better code once constant pool entries have been resolved,
and this pretty much has to be done lazily (not what the VM spec says,
but important for actual compatibility).

Tom

Are you using a debug or a release build? A release build (built with
make ENABLE_OPTIMIZED=1) is often 10x to 20x smaller than a debug
build, and links correspondingly faster. On some machines, a release
build builds *faster* than a debug build because the debug symbols are
so huge. The only thing you lose with a release build is the ability
to step into LLVM libraries in a debugger.

Ok, I rebuilt with ENABLE_OPTIMIZED=1. This did make a huge
difference -- my rebuild went down to 8 seconds (from 16 minutes... I
timed it this time; my earlier guess was off a bit).

Unfortunately it turns out I do need the debugging capabilities.
Darn.

I'd suggest sticking with llvm-config and not using shared
libraries.

I didn't dig into the Makefiles... are the libraries and whatnot built
-fPIC? I ask because I want to dynamically load this code into
libgcj. JVMs pretty much have to be shared libraries (or have a
separate version which is a shared library), at least if you want to
support the invocation API.

It would be straight-forward to add a new "easy" interface for
creating LLVM instructions. Would something like this work well for
you?

I considered doing this myself but in the end didn't have much need
for it. Anyway, don't add it on my account, I doubt I have too many
more problems like this in my code :slight_smile:

Tom

Hi Tom,

I didn't dig into the Makefiles... are the libraries and whatnot built
-fPIC?

If you do `make Verb=' then you'll see all the actual command
invocations and can grep for bits of interest.

Cheers,

Ralph.

The correct way to do that is:

make VERBOSE=1

you can also do:

make TOOL_VERBOSE=1

which implies VERBOSE=1 and also tells each tool (compiler, linker, etc)
to be verbose about the actions it is taking.

Reid.

Tom Tromey wrote:

> I would definitely like to look into it.

I'll send it in private email.

Would be nice.

> Yes very much! How did you find writing it directly in
> SSA-form.

Actually I used what Chris called "the alloca trick"... the JIT
doesn't really generate SSA form but instead uses alloca to allocate
space for the stack and locals (and temporaries where needed) and then
emits explicit loads and stores everywhere. On irc Chris showed me
how to invoke the LLVM pass to turn this back into something sane :slight_smile:

Okay that is because memory is not in SSA form?

> What is the footprint of libjit compared to the more heavy LLVM
> jit? How about recompiling in libjit? Has it support for CFG,
> dataflow analysis?

Oh, libjit is much smaller. I don't think it does much by way of
optimization.

Yes I can imagine that.

libjit takes a very simple approach to recompilation. There is a
function you can call to request recompilation for a method. Then
(AIUI -- didn't implement this in the JIT yet) libjit will call your
build function again, to re-create the IR. The docs talk a bit about
being able to run more optimizations, but I think that is just the
general idea, since I don't think there are actually other optimizers
available.

Okay. Which is the IR then? Is it LLVM bytecode?
I have to confess, that I don't know really much about the design, but
am very interested.

Even this amount of support is worthwhile in a JVM, fwiw. You can
generate better code once constant pool entries have been resolved,
and this pretty much has to be done lazily (not what the VM spec says,
but important for actual compatibility).

Kewl. I also once looked at kprobes to trap into readonly text
segements, and be able to trap for instance hot traces (like loops) for
inlining and so on. DynInst API is interesting in this regard. Kprobes
can be used for userprocess instrumentation as well.

After I have finished my thesis, I want to add generating LLVM bytecode
from intel assembly. This would be interesting to instrument native code
as well :slight_smile:

-- Jakob