JVM Backend

Hi,

I've written a backend for LLVM that allows LLVM IR to be transformed
to a Java/JVM class file (llvm-jvm.patch.gz attached).

Indirect function calls don't work yet, and there's probably some
minor bugs in it, but it works well for the test cases that I've run
through it. Also, several instructions are emulated by method calls
due to deficiencies in the JVM instruction set (e.g. lack of unsigned
arithmetic), so performance could also be improved in this area
(although the JVM should inline the calls when it feels it necessary).

In order to link and run the output, the LLJVM[1] runtime is required,
which is essentially a compatibility layer allowing things such as
pointers to be used (emulated) within the JVM. Jasmin[2] is also
required to assemble the output code.

In order to transform LLVM IR to a class file and run it, the
following process is required:

llc -march=jvm foo.bc -o foo.j
# link references to external functions/variables with
# a basic I/O class and the Java math library
java -jar path/to/lljvm.jar ld BasicIO java.lang.Math < foo.j > foo.linked.j
mv -f foo.linked.j foo.j
# assemble
java -jar path/to/jasmin.jar foo.j
# run
java -classpath path/to/lljvm.jar:. foo

At the moment only very basic I/O is available (attached
BasicIO.{java,class}) due to the lack of the C Standard Library.
Before I go reinventing the wheel, has anyone had any luck with
compiling a libc implementation (e.g. newlib) to LLVM IR (without any
platform-dependent inline assembly, etc)?

What is the process for requesting this patch be applied to the LLVM
source tree?

[1] Homepage: http://da.vidr.cc/projects/lljvm/
    Download: http://lljvm.googlecode.com/files/lljvm-0.1.jar
[2] http://jasmin.sf.net/

llvm-jvm.patch.gz (12.8 KB)

BasicIO.class (1.02 KB)

BasicIO.java (1.01 KB)

Hello, David

First of all, thanks for the backend submission. I let Chris comment
about the procedure of adding it to the tree. :slight_smile:

I just did a quick look into the code. The comments are below

Indirect function calls don't work yet, and there's probably some
minor bugs in it, but it works well for the test cases that I've run
through it.

Could you please provide some sort of "feature" test? So, we might be
sure that the stuff won't be broken due to e.g. some API change, etc.
Now we have a powerful FileCheck facility, so, you might just have a
one .ll file and check for the JVM code emitted.

In order to link and run the output, the LLJVM[1] runtime is required,

Can this be documented somehow? E.g. in a readme file in the backend
dir, etc? It will be nice if this library be somehow integrated into
LLVM as well.

The current big question is: how you're planning to deal with
arbitrary precision stuff which might come from LLVM IR. Currently all
the things seems to behave different in case of receiving e.g. i31:
- functions like getTypeName() return some junk (the case
Type::IntegerTypeID just falls through to Type::FloatTyID)
- other functions just assert

Could you please provide some sort of "feature" test? So, we might be
sure that the stuff won't be broken due to e.g. some API change, etc.
Now we have a powerful FileCheck facility, so, you might just have a
one .ll file and check for the JVM code emitted.

Additional patch attached, is this suitable?

In order to link and run the output, the LLJVM[1] runtime is required,

Can this be documented somehow? E.g. in a readme file in the backend
dir, etc?

I can do that.

The current big question is: how you're planning to deal with
arbitrary precision stuff which might come from LLVM IR.

I should be able to implement that. Would arbitrary precision support
be required for the initial commit of the backend?

Currently all
the things seems to behave different in case of receiving e.g. i31:
- functions like getTypeName() return some junk (the case
Type::IntegerTypeID just falls through to Type::FloatTyID)

getBitWidth raises an assertion before this can happen.

llvm-jvm-test.patch.gz (890 Bytes)

Hello, David

Additional patch attached, is this suitable?

Looks good, thanks. What's about arithmetics?

The current big question is: how you're planning to deal with
arbitrary precision stuff which might come from LLVM IR.

I should be able to implement that. Would arbitrary precision support
be required for the initial commit of the backend?

I really don't think so. But you should aware, that you can easy
obtain, say, i33 from C bitfields-heavy code, or i256 due to LLVM
optimizers.

Additional patch attached, is this suitable?

Looks good, thanks. What's about arithmetics?

Revised patch attached.

llvm-jvm-test.patch.gz (1003 Bytes)

Additional patch attached, is this suitable?

Looks good, thanks. What's about arithmetics?

Revised patch attached.

Hi David,

I'm not very excited about this patch. We already have a C backend and MSIL backend. Neither of those support the full generality of LLVM IR (for example, exceptions, 'weird' integers, etc) and therefore aren't reliable. I don't know of anyone using the MSIL backend and it continues to bit rot. IMO, the MSIL backend should just be removed.

Who is the expected client of this code? Will it be maintained going forward? Is it going to cover the full generality of LLVM IR constructs? How do you plan to handle unsafe IR?

-Chris

Who is the expected client of this code?

There are several reasons why compiling to JVM bytecode can be
desirable. For example, it can be executed on platforms that do not
support native code execution for security or other reasons e.g.
browser applets, mobile devices, some webhosts. From the Java
perspective, it allows libraries written in languages such as C to be
used in a cross-platform manner, without the need for reliance on a
native library through JNI.

Will it be maintained going forward?

I can't guarantee anything, but I'll do my best to keep it maintained.

Is it going to cover the full generality of LLVM IR constructs?
How do you plan to handle unsafe IR?

Sorry, I'm still relatively new to LLVM, so I'm not sure what you're
refering to.

If this patch is not to be included in the LLVM source tree, then is
it possible to distribute backends as standalone tools?

Who is the expected client of this code?

There are several reasons why compiling to JVM bytecode can be
desirable. For example, it can be executed on platforms that do not
support native code execution for security or other reasons e.g.
browser applets, mobile devices, some webhosts. From the Java
perspective, it allows libraries written in languages such as C to be
used in a cross-platform manner, without the need for reliance on a
native library through JNI.

That's all fine, if it actually works. Supporting the full generality of LLVM IR is imperative if you want to achieve that goal, and I'm not sure how you do that.

Will it be maintained going forward?

I can't guarantee anything, but I'll do my best to keep it maintained.

Is it going to cover the full generality of LLVM IR constructs?
How do you plan to handle unsafe IR?

Sorry, I'm still relatively new to LLVM, so I'm not sure what you're
refering to.

How does your code handle non-type-safe C code? How does it translate it to type-safe java code? If you translate to a big array of memory and index into it, how is it better than the mips -> java compiler?

If this patch is not to be included in the LLVM source tree, then is
it possible to distribute backends as standalone tools?

Sure, you can use 'llc -load foo.so' to dynamically load backends.

-Chris

If you translate to a big array of memory and index into it, how is it
better than the mips -> java compiler?

Yes, it is similar to the mips to java compiler in that regard, but it
does have several advantages over it. For example, functions are
mapped to individual methods (rather than just a big chunk of
translated instructions), allowing Java to call individual functions
in the compiled language and vice versa. This also allows programs to
be split amongst multiple classes rather than statically linking
everything into the one file (which is sometimes not possible for
large projects).

If this patch is not to be included in the LLVM source tree, then is
it possible to distribute backends as standalone tools?

Sure, you can use 'llc -load foo.so' to dynamically load backends.

Ok, I might do that then. Thanks for your time :slight_smile:

How do you handle tail calls and value types?

How do you handle tail calls and value types?

I haven't worried too much about optimisation yet, so it doesn't do
anything special for tail calls (although neither does the java
compiler). LLVM types are translated to their equivalent java
primitive type (or currently it raises an assertion if there is no
equivalent type).

So it will stack overflow on tail calls and break with run-time errors on
structs?

I would love to be able to evaluate LLVM IR in a safe environment (like the
JVM) for debugging purposes but this is too incomplete to be useful for me:
my IR depends heavily upon tail calls and value types. Unfortunately, the
MSIL backend and lli are also incomplete and, therefore, useless for me too.

As Chris said, the LLVM world really needs any fully working solution rather
than a selection of incomplete solutions.

So it will stack overflow on tail calls

At the moment, yes. But then again, so does java. Also, it looks like
they're working on support for tail calls in the Da Vinci Machine[1].

and break with run-time errors

When I said it raises an assertion, I meant at compile-time.

on structs?

No, structs are supported. The only unsupported types at the moment
(as far as I am aware) are things like i31 and f80.

I would love to be able to evaluate LLVM IR in a safe environment (like the
JVM) for debugging purposes but this is too incomplete to be useful for me:
my IR depends heavily upon tail calls and value types. Unfortunately, the
MSIL backend and lli are also incomplete and, therefore, useless for me too.

As Chris said, the LLVM world really needs any fully working solution rather
than a selection of incomplete solutions.

I haven't been working on this project for too long - you can't expect
it to be perfect on the first release.

[1] http://openjdk.java.net/projects/mlvm/subprojects.html#TailCalls

Hi David,

No, structs are supported. The only unsupported types at the moment
(as far as I am aware) are things like i31 and f80.

for funky sized integers, the most important operations to support are
loads and stores, shifts and logical operations (and, or, xor). These
are the ones that the optimizers like to introduce most. The logical
operations are straightforward. Loads and stores of iN are equivalent
to using memcpy of (N+7)/8 bytes from memory to wherever you are keeping
the value (for a load), or the reverse for a store, so that's pretty easy
as well [this assumes that you set things up right so that endianness is
the same in memory and in "registers"]. Shifts are the most complicated,
but still pretty simple.

Ciao,

Duncan.

> So it will stack overflow on tail calls

At the moment, yes. But then again, so does java.

Sure but a lot of people like me are using LLVM precisely because it offers
these wonderful features. As long as your JVM backend does not handle these
features correctly its utility is greatly diminished.

Also, it looks like they're working on support for tail calls in the Da
Vinci Machine[1].

I believe that work was actually finished some time ago by my hero, Arnold
Schwaighofer, who was also responsible for the excellent TCO implementation
in LLVM.

> and break with run-time errors

When I said it raises an assertion, I meant at compile-time.

> on structs?

No, structs are supported. The only unsupported types at the moment
(as far as I am aware) are things like i31 and f80.

How do you support structs when the JVM is incapable of expressing value
types? Do you box every aggregate in an object? Does insertvalue construct an
entirely new object? If so, the performance degradation will be orders of
magnitude. Optimizing structs for the JVM is not easy and you will never get
decent performance out of the JVM in the general case.

> As Chris said, the LLVM world really needs any fully working solution
> rather than a selection of incomplete solutions.

I haven't been working on this project for too long - you can't expect
it to be perfect on the first release.

Nobody is asking for perfection, just completeness.

Hi Duncan,

for funky sized integers, the most important operations to support are
loads and stores, shifts and logical operations (and, or, xor). These
are the ones that the optimizers like to introduce most. The logical
operations are straightforward. Loads and stores of iN are equivalent
to using memcpy of (N+7)/8 bytes from memory to wherever you are keeping
the value (for a load), or the reverse for a store, so that's pretty easy
as well [this assumes that you set things up right so that endianness is
the same in memory and in "registers"]. Shifts are the most complicated,
but still pretty simple.

Thanks for the tip :slight_smile: - I'll look into adding support for these.

> So it will stack overflow on tail calls

At the moment, yes. But then again, so does java.

Sure but a lot of people like me are using LLVM precisely because it offers
these wonderful features. As long as your JVM backend does not handle these
features correctly its utility is greatly diminished.

The issue is that current JVMs don't provide good support for tail
calls. Self recursive functions could probably be optimised into
loops, but apart from that I'm not sure what I can do. You're
obviously much more familiar with TCO than I am, so perhaps you can
suggest a solution.

No, structs are supported. The only unsupported types at the moment
(as far as I am aware) are things like i31 and f80.

How do you support structs when the JVM is incapable of expressing value
types? Do you box every aggregate in an object? Does insertvalue construct an
entirely new object? If so, the performance degradation will be orders of
magnitude. Optimizing structs for the JVM is not easy and you will never get
decent performance out of the JVM in the general case.

Oh, sorry, I see what you mean. No, first-class structs aren't supported.

I haven't been working on this project for too long - you can't expect
it to be perfect on the first release.

Nobody is asking for perfection, just completeness.

I'd just like to point out that I don't have a great deal of
experience in compiler development - I just thought that this would be
an interesting project to try. I realise that it isn't complete in
it's current state.

Hi David and Jon,

After reading this thread, I think there has been a slight misunderstanding.

I agree with Jon that there are some crucial problems with your patch
re odd size variables and tail calls (possibly many other issues that
were overlooked), but I also agree with David that, as this is an
experiment, it's not perfect, nor complete.

If you apply that as a patch now, everyone else will have to maintain
it when they do their unrelated changes, increasing the cost of the
project's maintenance. I welcome your code (have been wondering about
it recently too), but I think that you should keep it as a separate
project for now. Once it's at least complete, I'm sure people will be
happy to apply as a patch, but for now it'll be more nuisance than
help.

Just to give you some figures, I've written a very simple compiler
using LLVM with less than 10 classes using mostly the IR codegen, some
function passes and the JIT. It was common for me to have to update my
own code twice in the same day for changes in the internal libraries.
I was happy to do so, but I believe that if I ever merged my code to
mainstream, every single one of them would have to change my code as
well before committing.

Once it's complete, check in. Once it's perfect, get a Nobel prize. :wink:

My tuppence...

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

If you apply that as a patch now, everyone else will have to maintain
it when they do their unrelated changes, increasing the cost of the
project's maintenance. I welcome your code (have been wondering about
it recently too), but I think that you should keep it as a separate
project for now. Once it's at least complete, I'm sure people will be
happy to apply as a patch, but for now it'll be more nuisance than
help.

I hope I didn't give the impression that I was pushing to get this
commited - I'm quite happy to keep it as a separate project.

I've cleaned up the code a bit and released it as part of LLJVM[1], so
any fixes to the problems people have highlighted in this thread will
be pushed to the repository listed on that page as I get to them.

[1] http://da.vidr.cc/projects/lljvm/

First-class aggregate values can be supported in an environment like this
by lowering each member of a struct into a separate variable. This is what
LLVM's lib/CodeGen backends do, for example. Perhaps some of the code for
that could be factored out into utility routines.

Dan