Proposal for a Google summer of code project for the Java frontend.

As I told you, I am working with the Java frontend. Currently I am
working with the code from http://llvm.org/svn/llvm-project/java/. The
first task has been to put it up to date, since it was written for
LLVM 1.9 and there are few incompatibilities. This is mostly done. How
can I submit a patch?

I am interested to get this work funded by a Google summer of code
grant. Is there anyone interested in mentoring this project? Would be
the LLVM interested in it?

Best regards,

Ramon

Hi Ramon,

Ramón García wrote:

I am interested to get this work funded by a Google summer of code
grant. Is there anyone interested in mentoring this project? Would be
the LLVM interested in it?
  
I'm in the process of checking in a Java and a CLI FE to the llvm repository. I'm not sure what's the shape of llvm-java. Do your changes make it run some applications?

Is the grant already provided? I'd be happy to mentor your work if the project can also concern improving my existing implementation :wink:

Nicolas

The GSoC project has not really started yet, we haven't started accepting applicants. I think it would be a great thing to focus future java efforts on your front-end, provided it's in better shape than llvm-java (which isn't hard). It would be awesome if you were willing to mentor a student. Please sign up on the GSoC page as a mentor, thanks!

-Chris

I would like to see the code of your front end.

I started working with llvm-java, since it was what was available. It
does not run yet (small compilation issues, almost done),
but the design seems fine. It can run (when setup complete) small
pieces of Java code, but no real application, since it does
not support exception handling (jsr/ret bytecodes) yet,nor garbage
collection. Implementing these pieces would be the purpose
of this summer grant request.

A Java and CLI front end would be really interesting. What missing
features could I implement? I can start preparing a proposal.

Ramon

Ramón García wrote:

I would like to see the code of your front end.

It should be available in a week or two. I'm currently dealing with licensing issues.

I started working with llvm-java, since it was what was available. It
does not run yet (small compilation issues, almost done),
but the design seems fine. It can run (when setup complete) small
pieces of Java code, but no real application, since it does
not support exception handling (jsr/ret bytecodes) yet,nor garbage
collection. Implementing these pieces would be the purpose
of this summer grant request.

A Java and CLI front end would be really interesting. What missing
features could I implement? I can start preparing a proposal.
  
There is no real missing feature. The VMs can execute standard applications (last time I checked, the JVM can run tomcat), and the implementations follow the VM specifications. The CLI implementation lacks generics and overflow detection, but this is not top priority in my mind.

Among many features, here are the ones that I'd _love_ to see implemented:
1) Compilation optimizations for type-safe languages (including type-based alias analysis)
2) Hotspot-like vm: switching between interpreter and compiler, applying different optimizations depending on a method's hotness, doing on-stack replacement
3) Implementing a garbage collector with llvm facilities: currently my GC is not related to the compiler (consider it like Boehm's). It would be nice to see how do the LLVM intrinsics fit with Java/CLI and JIT.

Nicolas

Nicolas Geoffray wrote:

There is no real missing feature. The VMs can execute standard applications (last time I checked, the JVM can run tomcat), and the implementations follow the VM specifications. The CLI implementation lacks generics and overflow detection, but this is not top priority in my mind.

Among many features, here are the ones that I'd _love_ to see implemented:
1) Compilation optimizations for type-safe languages (including type-based alias analysis)
2) Hotspot-like vm: switching between interpreter and compiler, applying different optimizations depending on a method's hotness, doing on-stack replacement
3) Implementing a garbage collector with llvm facilities: currently my GC is not related to the compiler (consider it like Boehm's). It would be nice to see how do the LLVM intrinsics fit with Java/CLI and JIT.

I also forgot:
4) Generating shared libraries in order to not recompile Java or CLI code. In LLVM bitcode (simpler at first) and then in the ELF/MachO file format.

Based on my experience this last one (the generation of shared
libraries) is the most important performance wise, and the one that
would make a difference from a performance point of view.

I would like to prepare a proposal as soon as posible. Could I have a
look at your code privately, even if there are licensing issues
pending? I understand that this issues are just temporary, and will in
no way block the publication of the code at a certain time.

Best regards,
Ramon

Ramón García wrote:

Based on my experience this last one (the generation of shared
libraries) is the most important performance wise, and the one that
would make a difference from a performance point of view.

That's more or less true: generating shared libraries will improve startup time, not steady-state time. It will decrease steady-state performance (both for Java and CLI) because the VMs ensure a class will be fully initialized before its use. Therefore, while the JIT will have runtime knowledge of a class being fully initialized or not, a static compiler will have to be conservative and insert intialization checks on most uses of a class.

What you can do to tackle this issue is to generate different native code statically and let the VM choose which native code it has to execute (depending on which classes were already initialized).

I would like to prepare a proposal as soon as posible. Could I have a
look at your code privately, even if there are licensing issues
pending? I understand that this issues are just temporary, and will in
no way block the publication of the code at a certain time.

Do you really need the code for the proposal? Legally, I don't have the right (yet) to send it to you. Let's see how things go this week and hope that at the end of the week I'll be able to checkin the code.

Nicolas

I would prefer to see actual code to make safe schedules. With code I can see
what changes one must make. I can also show in detail these changes,
which would give security to the LLVM project that the proposal is
viable. By contrast,
without code, neither me nor LLVM project can ensure that the project will
be successfully performed behind schedule. Since this is a difficult
project (we
are talking about compilers which are complicated technologies) it is specially
important to have a good planning.

Ramón García wrote:

I would prefer to see actual code to make safe schedules. With code I can see
what changes one must make. I can also show in detail these changes,
which would give security to the LLVM project that the proposal is
viable. By contrast,
without code, neither me nor LLVM project can ensure that the project will
be successfully performed behind schedule. Since this is a difficult
project (we
are talking about compilers which are complicated technologies) it is specially
important to have a good planning.

OK, let's hope it'll be available soon enough then :slight_smile:

That's more or less true: generating shared libraries will improve
startup time, not steady-state time. It will decrease steady-state
performance (both for Java and CLI) because the VMs ensure a class will
be fully initialized before its use. Therefore, while the JIT will have
runtime knowledge of a class being fully initialized or not, a static
compiler will have to be conservative and insert intialization checks on
most uses of a class.
    
There was a misunderstanding. The issue with dynamic compiling is
not only startup time, but also memory consumption,

Can you tell me why dynamic compilation consumes memory? Except the fact that you need to embed a JIT in your VM, dynamic compilation should not consume that much memory. Once a method is compiled you can throw away the intermediate representation. Or are you talking about constrained devices where you do not want to embed a JIT?

and this is a *huge*
issue with Java applications. In addition, I think that in most of the cases
a class can be initialized at compile time.
  
No. Initialization must happen during execution. By intialization, I mean calling the clinit function (e.g. static{...} in Java source), not resolving the class (which can be at compile time).

Nicolas

The memory consumed by the code compiled is huge in server applications. And
it is not shared by different virtual machine instances. By contrast,
shared libraries
are memory mapped, therefore they are shared between different instances of the
executables using them. It is posible to throw away code after using
it, but then
compilation cost would be repeated.

And dynamic compilation prevents expensive optimization techniques.

Can you tell me why dynamic compilation consumes memory? Except the fact
that you need to embed a JIT in your VM, dynamic compilation should not
consume that much memory. Once a method is compiled you can throw away
the intermediate representation. Or are you talking about constrained
devices where you do not want to embed a JIT?

Ramon

OK, I didn't realize you were not one-JVM oriented. Sure, if the code can be shared between different instances of JVMs, that would be great! I actually had this in mind when I wrote the proposition :slight_smile:

Ramón García wrote:

The memory consumed by the code compiled is huge in server applications.

Do you have statistics about the relative footprints of code and various
kinds of objects in such a server application?
I know they have huge footprints (a general problem with Java
applications, I think), but I never got a useful breakdown along the
lines of what's responsible for that.

And
it is not shared by different virtual machine instances. By contrast,
shared libraries
are memory mapped, therefore they are shared between different instances of the
executables using them. It is posible to throw away code after using
it, but then
compilation cost would be repeated.

It would suffice to store a hash of the compiled code. That way, when
the same class somes along, you can reuse existing compiled code.

You'd want to keep the intermediate code if you try global optimizations
like monomorphisation (i.e. finding out those parameters and variables
that are never assigned to polymorphically - loading another class can
invalidate assumptions, so you may have to redo at least parts of the
optimization).

And dynamic compilation prevents expensive optimization techniques.

Well, then a hash should really be sufficient :slight_smile:

Regards,
Jo