Newbie questions

Hi,

I'm just learning about LLVM (really interesting) and have some newbie
questions. Feel free to ignore or disparage them if they're inappropriate :slight_smile:

My area of interest is using LLVM in a Java JVM setting. These are
just some random questions relating to that...

1. What is the status of the LLVM+Java effort? Is it GCJ-specific?
    Is there a web page? I found one link via google but it was broken.

2. I'm curious why the LLVM language includes no instructions for memory
    barriers or any kind of compare-and-swap (bus locking operation). Were
    these considered? Were they deemed too platform-specific? What about
    some definition of the atomicity of instructions (e.g., is a write of
    a 32 bit value to memory guaranteed to be atomic)? More generally does
    the LLVM language define a specific (at least partial) memory model?

3. Would it make sense to extend the LLVM language so that modules,
    variables, functions, and instructions could be annotated with
    arbitrary application-specific annotations? These would be basically
    ignored by LLVM but otherwise "ride along" with their associated items.
    Of course, the impact on annotations of code transformations would have
    to be defined (e.g., if a function is inlined, any associated annotations
    are discarded).

    The thought here is that more optimization may be possible when
    information from the higher-level language is available. E.g. the
    application could participate in transformations, using the annotations
    to answer questions asked by the LLVM transformation via callbacks.

    To give an example (perhaps this is not a real one because possibly it
    can already be captured by LLVM alone) is the use of a Java final instance
    field in a constructor. In Java we're guaranteed that the final field is
    assigned to only once, and any read of that field must follow the initial
    assignment, so even though the field is not read-only during the entire
    constructor, it can be treated as such by any optimizing transformation.

4. Has anyone written Java JNI classes+native code that "wrap" the LLVM API,
    so that the LLVM libraries can be utilized from Java code?

Thanks,
-Archie

Hi,

I'm just learning about LLVM (really interesting) and have some newbie
questions. Feel free to ignore or disparage them if they're inappropriate :slight_smile:

No worries.

My area of interest is using LLVM in a Java JVM setting. These are
just some random questions relating to that...

1. What is the status of the LLVM+Java effort?

Incomplete but significant progress has been made. Misha Brukman can
tell you more.

Is it GCJ-specific?

No, it implements its own Java compiler and bytecode translator.

    Is there a web page?

Not that I'm away of. But, you can obtain the source code from the llvm-
java repository via CVS. Just replace "llvm" with "llvm-java" in the
usual CVS instructions.

I found one link via google but it was broken.

Okay, sorry, I don't know about any web site.

2. I'm curious why the LLVM language includes no instructions for memory
    barriers or any kind of compare-and-swap (bus locking operation). Were
    these considered?

Yes.

Were they deemed too platform-specific?

No.

What about
    some definition of the atomicity of instructions (e.g., is a write of
    a 32 bit value to memory guaranteed to be atomic)? More generally does
    the LLVM language define a specific (at least partial) memory model?

Currently the language doesn't support atomic instructions. However,
some work has been done at UIUC to implement sufficient fundamental
instructions that could permit an entire threading and synchronization
package to be constructed. This work is not complete, and is not in the
LLVM repository yet. I'm not sure of the status at UIUC of this effort,
as it hasn't been discussed in a while. It is definitely something that
will be needed going forward.

3. Would it make sense to extend the LLVM language so that modules,
    variables, functions, and instructions could be annotated with
    arbitrary application-specific annotations?

Some of these things already inherit from the "Annotable" class which
permits Annotations on the object. However, their use is discouraged and
we will, eventually, remove them from the LLVM IR. The problem is that
Annotations create problems for the various passes that need to
understand them. We have decided, from a design perspective, that (a) if
its important enough to be generally applicable, it should be part of
the LLVM IR, not tucked away in an Annotation and (b) for things
specific to a language or system that Annotations are insufficient
anyway and a higher level construction (possibly making reference to
LLVM IR objects) would be needed anyway.

These would be basically
    ignored by LLVM but otherwise "ride along" with their associated items.
    Of course, the impact on annotations of code transformations would have
    to be defined (e.g., if a function is inlined, any associated annotations
    are discarded).

Yes, we've been through this discussion many times before and the
solution was to not support Annotations at all as discussed above. There
are numerous issues with saving the annotations in the bytecode, how
they affect the passes, what happens to them after the code is modified
by a pass (as you noted), etc.

    The thought here is that more optimization may be possible when
    information from the higher-level language is available. E.g. the
    application could participate in transformations, using the annotations
    to answer questions asked by the LLVM transformation via callbacks.

Its my opinion that those things should be handled by the higher-level
language's own passes on its AST where full semantic knowledge of the
language is available. Remember that LLVM provides an "Intermediate
Representation", not a high-level AST. The desire to support Annotations
is an attempt to force the IR into a higher level of abstraction than it
was designed for.

The use of callbacks is problematic as it would require LLVM to manage
numerous dynamic libraries that correspond to those callbacks, provide a
scheme for understanding which callbacks to call in various
circumstances, etc. Consider a bytecode file that was generated as
linking bytecode files from four or five different languages and then
being delivered to another environment for further optimization and
execution. Are all those language's dynamic libraries available so the
callbacks can be called?

    To give an example (perhaps this is not a real one because possibly it
    can already be captured by LLVM alone) is the use of a Java final instance
    field in a constructor. In Java we're guaranteed that the final field is
    assigned to only once, and any read of that field must follow the initial
    assignment, so even though the field is not read-only during the entire
    constructor, it can be treated as such by any optimizing transformation.

LLVM would already recognize such a case and permit the appropriate
optimizations.

4. Has anyone written Java JNI classes+native code that "wrap" the LLVM API,
    so that the LLVM libraries can be utilized from Java code?

No, there's no Java interface at this time. Patches accepted :slight_smile:

There is, however, a burgeoning PyPy interfaces that is being
developed.

Thanks,
-Archie
__________________________________________________________________________
Archie Cobbs * CTO, Awarix * http://www.awarix.com

You're welcome. Hope it was useful. I'm sure others will respond as
well, so stay tuned.

Reid Spencer.

Reid Spencer wrote:

1. What is the status of the LLVM+Java effort?

Incomplete but significant progress has been made. Misha Brukman can
tell you more.

Is it GCJ-specific?

No, it implements its own Java compiler and bytecode translator.

Has it been hooked up to a JVM? If so, how and which ones?

Thanks for your other answers re annotations and memory model.

-Archie

My area of interest is using LLVM in a Java JVM setting. These are
just some random questions relating to that...

1. What is the status of the LLVM+Java effort?

Incomplete but significant progress has been made. Misha Brukman can
tell you more.

There are actually two different LLVM + Java projects. The 'llvm-java' project, developed primarily by Alkis Evlogimenos <alkis@evlogimenos.com>, is in LLVM CVS. It has basic stuff working, but is far from complete. It is also stalled: noone is working on it any longer.

The second project is a LLVM JIT backend to GCJX. GCJX (developed by Tom Tromey) is far more complete, but the LLVM backend is quite new.

2. I'm curious why the LLVM language includes no instructions for memory
    barriers or any kind of compare-and-swap (bus locking operation). Were
    these considered?

Yes.

Someone did actually develop intrinsics for these, but they were never contributed back to LLVM.

What about
    some definition of the atomicity of instructions (e.g., is a write of
    a 32 bit value to memory guaranteed to be atomic)? More generally does
    the LLVM language define a specific (at least partial) memory model?

Currently the language doesn't support atomic instructions. However,

...

LLVM repository yet. I'm not sure of the status at UIUC of this effort,
as it hasn't been discussed in a while. It is definitely something that
will be needed going forward.

My understanding is that it is stalled. If someone wanted to do something regarding this, it would be quite welcome. In particular, for atomic operations, starting with implementation of the new GCC atomic builtins would make a lot of sense.

    To give an example (perhaps this is not a real one because possibly it
    can already be captured by LLVM alone) is the use of a Java final instance
    field in a constructor. In Java we're guaranteed that the final field is
    assigned to only once, and any read of that field must follow the initial
    assignment, so even though the field is not read-only during the entire
    constructor, it can be treated as such by any optimizing transformation.

LLVM would already recognize such a case and permit the appropriate
optimizations.

This specific optimization can also be easily handled in a LLVM JIT environment.

-Chris

llvm-java has been hooked up to a class library (classpath), and implements all of the VM (AFAIK). That said, you'd probably be better off working on GCJX right now, unless you'd like to do a lot of fundamental development on the llvm-java front-end.

-Chris

Reid Spencer wrote:
>> 1. What is the status of the LLVM+Java effort?
>
> Incomplete but significant progress has been made. Misha Brukman can
> tell you more.
>> Is it GCJ-specific?
>
> No, it implements its own Java compiler and bytecode translator.

Has it been hooked up to a JVM? If so, how and which ones?

I think the point of llvm-java was to avoid a JVM. That is, it converts
either Java source or Java bytecode into equivalent LLVM bytecode. I
think the big thing lacking so far are the Java library and support for
things that LLVM doesn't natively support (threading, synchronization
come to mind). If you need more detail, Alkis (author of llvm-java) is
going to have to respond. Otherwise, you'll need to take a look at the
code.

Thanks for your other answers re annotations and memory model.

You're welcome.

Has it been hooked up to a JVM? If so, how and which ones?

I think the point of llvm-java was to avoid a JVM. That is, it converts

llvm-java is the JVM.

either Java source or Java bytecode into equivalent LLVM bytecode. I

llvm-java only supports input from Java bytecode.

think the big thing lacking so far are the Java library and support for

llvm-java uses classpath for it's library.

things that LLVM doesn't natively support (threading, synchronization
come to mind). If you need more detail, Alkis (author of llvm-java) is
going to have to respond. Otherwise, you'll need to take a look at the
code.

It's actually missing quite a bit. It is missing too much to support programs that use System.Out, for example. Alkis is definitely the person to talk to if you're interested in it.

-Chris

Chris Lattner wrote:

I think the point of llvm-java was to avoid a JVM. That is, it converts

llvm-java is the JVM.

either Java source or Java bytecode into equivalent LLVM bytecode. I

llvm-java only supports input from Java bytecode.

think the big thing lacking so far are the Java library and support for

llvm-java uses classpath for it's library.

things that LLVM doesn't natively support (threading, synchronization
come to mind). If you need more detail, Alkis (author of llvm-java) is
going to have to respond. Otherwise, you'll need to take a look at the
code.

It's actually missing quite a bit. It is missing too much to support programs that use System.Out, for example. Alkis is definitely the person to talk to if you're interested in it.

Thanks.. I'm actually more interested in what would be involved to
hook up LLVM to an existing JVM. In particular JCVM (http://jcvm.sf.net).
JCVM analyzes bytecode using Soot, emits C code, compiles that with GCC,
and then loads executable code from the resulting ELF files.. given this
design, using LLVM/modules instead of Soot/GCC/ELF would not be very much
different, but would allow more cool things to happen.

The main barrier to this idea for me are (besides the usual: limited time
for fun projects) is understanding how it could work. In particular, how
would one bridge the C vs. C++ gap. JCVM is written in C, and I have lots
of C and Java experience, but zero with C++. Dumb question: can a C program
link with and invoke C++ libraries? Or perhaps a little C++ starter program
is all that is needed, then the existing code can be used via extern "C"?
Alternately, if there were Java JNI wrappers, I could invoke those... Etc.

-Archie

It's actually missing quite a bit. It is missing too much to support programs that use System.Out, for example. Alkis is definitely the person to talk to if you're interested in it.

Thanks.. I'm actually more interested in what would be involved to
hook up LLVM to an existing JVM. In particular JCVM (http://jcvm.sf.net).
JCVM analyzes bytecode using Soot, emits C code, compiles that with GCC,
and then loads executable code from the resulting ELF files.. given this
design, using LLVM/modules instead of Soot/GCC/ELF would not be very much
different, but would allow more cool things to happen.

Okay.

The main barrier to this idea for me are (besides the usual: limited time
for fun projects) is understanding how it could work. In particular, how
would one bridge the C vs. C++ gap. JCVM is written in C, and I have lots
of C and Java experience, but zero with C++. Dumb question: can a C program
link with and invoke C++ libraries? Or perhaps a little C++ starter program
is all that is needed, then the existing code can be used via extern "C"?
Alternately, if there were Java JNI wrappers, I could invoke those... Etc.

C programs certainly can use C++ libraries, as you say, with extern "C".

I don't know how well JCVM would work with llvm-java, I guess you'd have to try it and see.

-Chris

Chris Lattner wrote:

I think the point of llvm-java was to avoid a JVM. That is, it converts

llvm-java is the JVM.

either Java source or Java bytecode into equivalent LLVM bytecode. I

llvm-java only supports input from Java bytecode.

think the big thing lacking so far are the Java library and support for

llvm-java uses classpath for it's library.

things that LLVM doesn't natively support (threading, synchronization
come to mind). If you need more detail, Alkis (author of llvm-java) is
going to have to respond. Otherwise, you'll need to take a look at the
code.

It's actually missing quite a bit. It is missing too much to support programs that use System.Out, for example. Alkis is definitely the person to talk to if you're interested in it.

Thanks.. I'm actually more interested in what would be involved to
hook up LLVM to an existing JVM. In particular JCVM (http://jcvm.sf.net).
JCVM analyzes bytecode using Soot, emits C code, compiles that with GCC,
and then loads executable code from the resulting ELF files.. given this
design, using LLVM/modules instead of Soot/GCC/ELF would not be very much
different, but would allow more cool things to happen.

If you're only interested in using LLVM for "cool things" (such as optimization), you could use it directly on the C code you emit.

Either way, one issue that you will have to deal with is preserving the behavior of Java exceptions (assuming you care about that). LLVM does not preserve the order of potentially excepting instructions (e.g., a divide or a load). This would have to be handled explicitly, whether you use llvm-java or simply used LLVM on the C code from Soot. I don't know if/how libgcj handles this but Tom may be able to say more about that.

The main barrier to this idea for me are (besides the usual: limited time
for fun projects) is understanding how it could work. In particular, how
would one bridge the C vs. C++ gap. JCVM is written in C, and I have lots
of C and Java experience, but zero with C++. Dumb question: can a C program
link with and invoke C++ libraries? Or perhaps a little C++ starter program
is all that is needed, then the existing code can be used via extern "C"?
Alternately, if there were Java JNI wrappers, I could invoke those... Etc.

-Archie

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Vikram Adve wrote:

If you're only interested in using LLVM for "cool things" (such as optimization), you could use it directly on the C code you emit.

Yes... though the translation to C loses some efficiency due to
"impedance mismatch". More ideal would be to go from bytecode -> LLVM
directly (I understand this part has already been done more or less).

Either way, one issue that you will have to deal with is preserving the behavior of Java exceptions (assuming you care about that). LLVM does not preserve the order of potentially excepting instructions (e.g., a divide or a load). This would have to be handled explicitly, whether you use llvm-java or simply used LLVM on the C code from Soot. I don't know if/how libgcj handles this but Tom may be able to say more about that.

Right.. I think we'd have to revert to signal-less exception checking
for null pointer and divide-by-zero for the time being.

But this brings up a good point.. should LLVM have an "instruction
barrier" instruction? I.e., an instruction which, within the context
of one basic block, would prevent any instructions before the barrier
from being reordered after any instructions after the barrier?

Then a JVM could use signals and still guarantee Java's "exactness"
of exceptions by bracketing each potentially-signal-generating instruction
with instruction barriers.

Someone must have already invented this and given it a better name.

Related idea.. what if all instructions (not just "invoke") could be
allowed to have an optional "except label ..."?

-Archie

This is the direction that we plan to go, when someone is interested enough to implement it. There are some rough high-level notes about this idea here:
http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt

-Chris

Chris Lattner wrote:

Related idea.. what if all instructions (not just "invoke") could be
allowed to have an optional "except label ..."?

This is the direction that we plan to go, when someone is interested enough to implement it. There are some rough high-level notes about this idea here:
http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt

Those ideas make sense.. one question though:

Note that this bit is sufficient to represent many possible scenarios. In
particular, a Java compiler would mark just about every load, store and other
exception inducing operation as traping. If a load is marked potentially
trapping, the optimizer is required to preserve it (even if its value is not
used) unless it can prove that it dynamically cannot trap. In practice, this
means that standard LLVM analyses would be used to prove that exceptions
cannot happen, then clear the bit. As the bits are cleared, exception handlers
can be deleted and dead loads (for example) can also be removed.

The idea of the optimizer computing that a trap can't happen is obviously
desirable, but how does the front end tell the optimizer how to figure
that out? I.e., consider this java:

   void foo(SomeClass x) {
     x.field1 = 123;
     x.field2 = 456; // no nNullPointerException possible here
   }

Clearly an exception can happen with the first statement -- iff x is null.
But no exception is possible on the second statement. But how does the
optimizer "know" this without being Java specific? It seems like LLVM
will have to have some built-in notion of a "null pointer" generated
exception. Similarly for divide by zero, e.g.:

   void bar(int x) {
     if (x != 0)
       this.y = 100/x; // no ArithmeticException possible here
   }

How will the optimizer "know" the exception can't happen?

Chris Lattner wrote:
>> Related idea.. what if all instructions (not just "invoke") could be
>> allowed to have an optional "except label ..."?
>
> This is the direction that we plan to go, when someone is interested
> enough to implement it. There are some rough high-level notes about
> this idea here:
> http://nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt

Those ideas make sense.. one question though:

> Note that this bit is sufficient to represent many possible scenarios. In
> particular, a Java compiler would mark just about every load, store and other
> exception inducing operation as traping. If a load is marked potentially
> trapping, the optimizer is required to preserve it (even if its value is not
> used) unless it can prove that it dynamically cannot trap. In practice, this
> means that standard LLVM analyses would be used to prove that exceptions
> cannot happen, then clear the bit. As the bits are cleared, exception handlers
> can be deleted and dead loads (for example) can also be removed.

The idea of the optimizer computing that a trap can't happen is obviously
desirable, but how does the front end tell the optimizer how to figure
that out? I.e., consider this java:

   void foo(SomeClass x) {
     x.field1 = 123;
     x.field2 = 456; // no nNullPointerException possible here
   }

Clearly an exception can happen with the first statement -- iff x is null.
But no exception is possible on the second statement. But how does the
optimizer "know" this without being Java specific? It seems like LLVM
will have to have some built-in notion of a "null pointer" generated
exception. Similarly for divide by zero, e.g.:

I think this is feasible to optimize in llvm. This would require to
write an optimization pass (that can be used by any language
implementation). Since x.field1 and x.field2 will involve
getelementptr instructions, we can have some logic in an optimization
pass to prove what you are saying: if x is null then only the first
memop through a getelementptr on x will trap.

   void bar(int x) {
     if (x != 0)
       this.y = 100/x; // no ArithmeticException possible here
   }

How will the optimizer "know" the exception can't happen?

This should be pretty straight forward to implement as well (by
writing the proper optimization pass).

------

Another random question: can a global variable be considered variable
in one function but constant in another?

No.

Motivation: Java's "first active use" requirement for class initialization.
When invoking a static method, it's possible that a class may need to
be initialized, However, when invoking an instance method, that's not
possible.

Perhaps there should be a way in LLVM to specify predicates (or at least
properties of global variables and parameters) that are known to be true
at the start of each function... ?

I think this will end up being the same as the null pointer trapping
instruction optimization. The implementation will very likely involve
some pointer to the description of the class. To make this fast this
pointer will be null if the class is not loaded and you trap when you
try to use it and perform initialization. So in the end the same
optimization pass that was used for successive field accesses can be
used for class initialization as well.

Alkis Evlogimenos wrote:

Motivation: Java's "first active use" requirement for class initialization.
When invoking a static method, it's possible that a class may need to
be initialized, However, when invoking an instance method, that's not
possible.

Perhaps there should be a way in LLVM to specify predicates (or at least
properties of global variables and parameters) that are known to be true
at the start of each function... ?

I think this will end up being the same as the null pointer trapping
instruction optimization. The implementation will very likely involve
some pointer to the description of the class. To make this fast this
pointer will be null if the class is not loaded and you trap when you
try to use it and perform initialization. So in the end the same
optimization pass that was used for successive field accesses can be
used for class initialization as well.

If that were the implementation then yes that could work. But using
a null pointer like this probably wouldn't be the case. In Java you have
to load a class before you initialize it, so the pointer to the type
structure will already be non-null.

In JCVM for example, there is a bit in type->flags that determines
whether the class is initialized or not. This bit has to be checked
before every static method invocation or static field access. You could
reserve an entire byte instead of a bit, but I don't know if that would
make it any easier to do this optimization.

The idea of the optimizer computing that a trap can't happen is obviously
desirable, but how does the front end tell the optimizer how to figure

It depends on the front-end. If you're coming from C, there is no good way. C can't express these properties well. However, this this specific case:

that out? I.e., consider this java:

void foo(SomeClass x) {
   x.field1 = 123;
   x.field2 = 456; // no nNullPointerException possible here
}

Clearly an exception can happen with the first statement -- iff x is null.
But no exception is possible on the second statement. But how does the
optimizer "know" this without being Java specific?

This isn't specific to Java. LLVM pointers can't wrap around the end of the address space, so if the first access successed, the second must also. LLVM won't guarantee exceptions for bogus pointers, just null pointers, so it could do this without a problem.

Another random question: can a global variable be considered variable
in one function but constant in another?

No.

Motivation: Java's "first active use" requirement for class initialization.
When invoking a static method, it's possible that a class may need to
be initialized, However, when invoking an instance method, that's not
possible.

You need to modify the isConstant flag on the global after the initializer has been run. This requires JIT compilation. Note that the LLVM optimizer should already do a reasonable job of optimizing some common cases of this, as a similar thing happens when initializing C++ static variables.

Perhaps there should be a way in LLVM to specify predicates (or at least
properties of global variables and parameters) that are known to be true
at the start of each function... ?

We don't have something like this currently. It sounds tricky to get right.

Trying to summarize this thread a bit, here is a list of some of the
issues brought up relating to the goal of "best case" Java support...

1. Definition and clarification of the memory model.
2. Need some instructions for atomic operations.
3. Explicit support for exceptions from instructions other than invoke.
4. Ensuring there are mechanisms for passing through all appropriate
   optimization-useful information from the front end to LLVM in a
   non-Java-specific way (e.g., see "active use" check above).

Yup.

-Chris

This is up to the unwinding library implementation for the target in question. LLVM will support it if your unwinder library supports it.

-Chris

Alkis Evlogimenos wrote:
>> Motivation: Java's "first active use" requirement for class initialization.
>> When invoking a static method, it's possible that a class may need to
>> be initialized, However, when invoking an instance method, that's not
>> possible.
>>
>> Perhaps there should be a way in LLVM to specify predicates (or at least
>> properties of global variables and parameters) that are known to be true
>> at the start of each function... ?
>
> I think this will end up being the same as the null pointer trapping
> instruction optimization. The implementation will very likely involve
> some pointer to the description of the class. To make this fast this
> pointer will be null if the class is not loaded and you trap when you
> try to use it and perform initialization. So in the end the same
> optimization pass that was used for successive field accesses can be
> used for class initialization as well.

If that were the implementation then yes that could work. But using
a null pointer like this probably wouldn't be the case. In Java you have
to load a class before you initialize it, so the pointer to the type
structure will already be non-null.

That is why I said if you want it to be fast :-). My point was that if
you want this to be fast you need to find a way to make it trap when a
class is not initialized. If you employ the method you mention below
for JCVM then you need to perform optimizations to simplify the
conditionals.

Alkis Evlogimenos wrote:

Alkis Evlogimenos wrote:

Motivation: Java's "first active use" requirement for class initialization.
When invoking a static method, it's possible that a class may need to
be initialized, However, when invoking an instance method, that's not
possible.

Perhaps there should be a way in LLVM to specify predicates (or at least
properties of global variables and parameters) that are known to be true
at the start of each function... ?

I think this will end up being the same as the null pointer trapping
instruction optimization. The implementation will very likely involve
some pointer to the description of the class. To make this fast this
pointer will be null if the class is not loaded and you trap when you
try to use it and perform initialization. So in the end the same
optimization pass that was used for successive field accesses can be
used for class initialization as well.

If that were the implementation then yes that could work. But using
a null pointer like this probably wouldn't be the case. In Java you have
to load a class before you initialize it, so the pointer to the type
structure will already be non-null.

That is why I said if you want it to be fast :-). My point was that if
you want this to be fast you need to find a way to make it trap when a
class is not initialized. If you employ the method you mention below
for JCVM then you need to perform optimizations to simplify the
conditionals.

I get it. My point however is larger than just this one example.
You can't say "just use a null pointer" for every possible optimization
based on front end information. Maybe that happens to work for
active class checks, but it's not a general answer.

Requoting myself:

> I.e., my question is the more general one:
> how do optimizations that are specific to the front-end language get
> done? How does the front-end "secret knowledge" get passed through
> somehow so it can be used for optimization purposes?

-Archie

Archie,

The quick answer is that it doesn't. The front end is responsible for
having its own AST (higher level representation) and running its own
optimizations on that. From there you generate the LLVM intermediate
representation (IR) and run on that whatever LLVM optimization passes
are appropriate for your language and the level of optimization you want
to get to. The "secret knowledge" is retained by the language's front
end. However, your front end is in control of two things: what LLVM IR
gets generated, and what passes get run on it. You can create your own
LLVM passes to "clean up" things that you generate (assuming there's a
pattern).

We have tossed around a few ideas about how to retain front-end
information in the bytecode. The current Annotation/Annotable construct
in 1.7 is scheduled for removal in 1.8. There are numerous problems with
it. One option is to just leave it up to the front end. Another option
is to allow a "blob" to be attached to the end of a bytecode file.

On another front, you might be interested in http://hlvm.org/ where a
few interested LLVM developers are thinking about just these kinds of
things and ways to bring high level support to the excellent low level
framework that LLVM provides. Note: this effort has just begun, so don't
expect to find much there for another few weeks.

Reid.