Introducing myself, and Java project.

Hello, I am Ramon Garcia Fernandez. My interest in LLVM is to develop
an interface for Java virtual machine bytecodes, so that Java programs
can be run under LLVM.

You may ask why not using the Java virtual machine. Although it may be
improved, there are some misfeatures in it. This is what I have
learned. It makes the communication with native code too expensive.
Passing an array from native to the virtual machine or vice versa
requires a copy of the data. Why? you may ask. Because Java uses
garbage collectors based on copying. Thus the position of an object
may be moved by the virtual machine. The implementation of
generational garbage collection in Java uses areas of memory for each
generation, so that when an object changes from the young generation
to the old its storage must be moved. This may give some performance
advantage, by making young objects close in memory, but with the cost
of making exchange of data with native code expensive. In particular,
data copying is required for reading and writing files, sending or
receiving data from the network, or drawing. Since Java is not often
used for numerical analysis or tasks that require little data exchange
with the outside world, I disagree that the implementation with a
copying collector is good for most applications.

A more obvious problem is, of course, that it is not possible to
compile Java code statically and save the result in the disk.

So I am starting to write a compiler of Java bytecode to LLVM
bytecode. For now I am designing, dealing with things such as how to
assign stack positions to the operands of each instruction.

My target is to deliver something simple. Operations such as
classloader creation and dynamic class loading will not be supported.

Hoping that this is the start of a long term cooperation,

    Ramon

Have you looked at the this?
http://llvm.org/viewvc/llvm-project/java/trunk/

Someone started a Java frontend. Perhaps you could finish it?

-Tanya

No, I didn't, I am going to look at it.

I have just worked with this code. The architecture is fine, and I
think that this code should be reused,
It needs updating, however, because it does not compile with LLVM 2.1
(I prefer to use a stable version
to focus my work, and port to LLVM 2.2 later).

I have seen that one incompatibility is that this Java frontend
requires C++ with exceptions, but LLVM is compiled with
-fno-exceptions. For now, I am compiling with -fexceptions. Should
exceptions be removed from the code of the Java frontend?

Then, I have doubts about whether the changes for getting it built are
correct or not. I will make more questions later.

This could be a work plan:

* Getting the java frontend built.

* Implement exception handling (jsr/ret bytecodes)

* Implement garbage collection.

* Support JAR files.

This should get an usable Java implementation. But there is still very
hard work to be done. The difficult part is dynamic class loading,
reflection and creation of classloaders. This would enable to use LLVM
for Java server applications such as Tomcat or JBoss. I am not sure if
this work is possible without funding a full time position. Just some
questions to think about it. To what extent does LLVM support dynamic
code loading? Is it posible to get code loaded at runtime? Could this
break assumptions made by interprocedural optimization? (A function
may be called in unexpected ways)

Another difficult part is optimization. In order to get good
performance, references should be converted to values whenever
possible. Recent virtual machines support scape analysis, so that
local references can be converted into values, and be stored and
released in the stack. This should be generalized to references that
are class members.

Java code is particular hard to optimize because any function call is
a virtual function call. Is inlining posible? Only if one makes
assumptions about any code using some class, that no other class is
going to override the called method. Programmers could declare methods
final, but this is rarely done. Assumptions may be checked for all
loaded classes, but, for classes not yet loaded (and which may be
loaded dynamically), who nows?

But this is for very long term feature. For now, let us have fun
completing the doable parts.

Best regards,

Ramon

You probably want to sit down and have a long talk with Jeroen Frijters, the
principal behind the IKVM project.

Note that you will have to deal with ClassLoaders at some level, regardless
of what you'd prefer.

A more obvious problem is, of course, that it is not possible to
compile Java code statically and save the result in the disk.

That is untrue--last time I checked, gcj does this out of the box. Several
other tools used to (TowerJ, I think its name was), but the demand for this
turned out to be nil and they folded. Most of Java's appeal lies in its
ability to dynamically link libraries.

And quite frankly, the overhead of passing native data across that JNI
boundary is generally pretty tiny, unless you do some truly idiotic things
in either your Java or your JNI/C++ code. I still wouldn't want to do it in
a tight loop, mind you, but it's generally not more than a handful of
assembly instructions. (This is what I've been told, anyway--I haven't pored
over the OpenJDK sources to find the actual code that does the translation.)

Having said that, I think a JVM->LLVM bytecode converter is a really cool
idea. But I think you're ultimately going to come to the same decision IKVM
did, which is to support ClassLoading as well as static loading.

Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com

Hi Ramon,

I have just worked with this code. The architecture is fine, and I
think that this code should be reused,
It needs updating, however, because it does not compile with LLVM 2.1
(I prefer to use a stable version
to focus my work, and port to LLVM 2.2 later).

LLVM 2.2 comes out in a week, I would recommend using that over 2.1 if you must use a release. You can use the pre-releases to get started. However, if you are going to be actively developing you should use llvm svn.

-Tanya

Sorry for the confusion about the JNI overhead, perhaps I wasn't clear.

The big overhead of calling JNI happens if one passes an array because
in this case data must be copied (the JNI interface allows the
implementation to choose, but the current JDK implementation always
copies data). This mean that for a Java application to read data from
a file, to fetch bytes from a network connection, or to paint a bitmap
in the screen, data must be first copied from virtual machine memory
to native memory, and then the operation is done. The reason is that
the implementation of garbage collection which copies objects, and
thus native code cannot assume that the memory position of an object
or array is fixed.

See, for instance, the specification of
Get<PrimitiveType>ArrayElements,
http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/functions.html#wp17382
The interface allows the virtual machine implementation to copy or
not, but the current implementation always copies array data.

Ramon

Thanks. I have just downloaded LLVM tag release 2.2. From time to
time, I will try working with SVN trunk release. But please understand
that I would rather avoid fighting with compilation problems or
temporary bugs that would distract me.