LLVM IR is a compiler IR

Hi Folks –

Let me go ahead and pose a question similar to the one Joachim poses below. I too am trying to evaluate whether LLVM will be of use to me in building a compiler and garbage collection mechanism for a byte code vm that I have built. Although there are multiple types of code that can be created with this system, all of them eventually translate to execution of one or more of these byte codes – with the exception of stuff that’s coded directly in Intel assembly language (which I understand can be output as is and left untouched by the LLVM compiler if desired). There’s about 32 core op codes that constitute the basic instruction set and I can envision mapping each of these to some sequence of LLVM IR. There’s also a whole lot more “extended opcodes” that are executed by the same core instruction execution loop but which are coded using the built-in Intel assembler and added dynamically by the system. I could envision also going to the trouble of mapping each of these to a sequence of LLVM IR instructions and then being able to emit a series of LLVM IR sequences purely based on the sequence of vm opcodes encountered in a scan of code compiled for the vm.

I’m hoping that such a product could then be submitted to all the LLVM optimizations and result in better Intel assembly code generation than what I have hand-coded myself (in my implementations of either the core or the extended opcodes – and especially in the intel code sequences resulting from the use of these opcodes in sequences together). So first question is simply to ask for a validation of this thinking and whether such a strategy seems feasible.

The second question pertains to this discussion thread. If at the end of the day, all I am trying to do is compile for an 80x86 platform (although ideally hoping to target Windows, Linux and the Mac) and don’t need to target multiple processors, then LLVM should add significant value for me if the answer to the first question is that it is a sound and sensible strategy. And most of the discussion in this thread about platform-specific issues shouldn’t apply if I only have one processor type to target. Am I thinking about this correctly?

Any insights from some of you old hands would be greatly appreciated.

Thanks.

Mike

Message: 1

Michael Clagett <mclagett@hotmail.com> writes:

There's about 32 core op codes that constitute the basic instruction
set and I can envision mapping each of these to some sequence of LLVM
IR. There's also a whole lot more "extended opcodes" that are
executed by the same core instruction execution loop but which are
coded using the built-in Intel assembler and added dynamically by the
system. I could envision also going to the trouble of mapping each of
these to a sequence of LLVM IR instructions and then being able to
emit a series of LLVM IR sequences purely based on the sequence of vm
opcodes encountered in a scan of code compiled for the vm.

I'm hoping that such a product could then be submitted to all the LLVM
optimizations and result in better Intel assembly code generation than
what I have hand-coded myself (in my implementations of either the
core or the extended opcodes -- and especially in the intel code
sequences resulting from the use of these opcodes in sequences
together). So first question is simply to ask for a validation of
this thinking and whether such a strategy seems feasible.

Let me make sure I'm understanding you correctly. You want to map each
of you opcodes into an LLVM sequence and then use the LLVM optimizations
and JIT to generate efficient native code implementations? Then you
would invoke those implementations during interpretation?

Or is it that you want to take a bytecode program, map it to LLVM IR,
run it through optimizations and codegen to produce a native executable?

Either one of these will work and LLVM seems like a good match as long
as you don't expect the optimizations to understand the higher-level
semantics of your opcodes (without some work by you, at least).

I don't quiet grasp any benefit to the first use as I would just go
ahead and generate the optimal native code sequence for each opcode once
and be done with it. No LLVM needed at all. So I suspect this is not
what you want to do

                             -Dave

Sorry, menat to upload this to the list.

Sorry for the noise, but this is the message I meant to send to the list rather than replying to David directly. Unfortunately, I just sent his message to me before.

Michael Clagett <mclagett@hotmail.com> writes:

It is actually the first of your alternatives above that I was hoping
to achieve and the reason I was thinking this would be valuable is
twofold. First, I don't have so much faith in the quality or optimal
character of my own byte code implementations. The core 32 opcodes
tend to be high-level implementations of low-level operations in
dealing with the elements of the virtual machine

Ok, so these are mildly complex operations. It makes sense to start
with some kind of machine-generated asm implementation to get optimized
performance. Don't write the opcode implementations in asm directly.
Write them in a high level language and compile them to native code.
See below.

The top of data and return stacks are mapped to register EAX and EDI,
respectively, and the address reg is mapped to ESI.

Does this have to be the case? See below.

It was my general feeling that a good SSA-based compilation mechanism
like that of LLVM could do a better job at maximizing the use of the
Intel's limited resources than I could.

As an alternative to using the JIT, would it be possible to implement
each opcode in its own interpreter function and compile them statically?
Of course there would be call overhead interpreting each opcode. Once
you've got that you could apply various techniques such as threading the
interpreter (not multiprocessing, but threading the interpreter as in
http://en.wikipedia.org/wiki/Threaded_code) to eliminate the overhead.

I don't think there's any particular reason to rely on the JIT unless
you want to take it a bit further and optimize a specific sequence of
opcodes seen when interpreting a specific program. But then we're
getting into the various Futamura transformations. :slight_smile:

Moreover, as long as code remains at the VM instruction level, these
resources are even more constrained than usual. EDX needs to be
preserved to hold the VM instruction pointer. EAX, ESI and EDI need
to be preserved for the purposes outlined above.

Why do those registers need to be preserved? Imagine the interpreter
were written completely in a high level language. The compiler doesn't
care which register holds a stack pointer, data pointer, etc. as long as
the virtual machine's state is consistent.

Similar considerations apply to simply reducing from 10 instructions
to 1 or 2 instructions operations that at the VM level require the
stack, but that at the intel assembler level would more naturally be
handled in registers.

Ah, ok, this is interesting. You want to change the execution model
on-the-fly. A grad school colleague of mine did something very similar
to this, translating a stack machine into a register machine. Of course
he's a hardware nut so he designed hardware to do it. :slight_smile: Unfortunately,
I don't think he ever published anything on it.

Doing the threading thing mentioned above or the JIT/Dynamo thing
mentioned below can both accomplish this, I think, and without any
register constraints if I'm understanding you correctly.

Finally, I just have the general feeling that more significant
compiler optimizations can be effected across sequences of what are my
vm opcode implementations. This is a general feeling, but I'm hoping
fairly well-founded.

Yes, that's true. See the Futamura reference above. Given a VM and an
input program, you can in fact generate an optimized executable. This
is the logical extension of what you're getting at.

For this kind of thing a JIT makes sense. You might have a look at what
the HP people did with Dynamo. They got a lot of performance out of
translating PA-RISC to PA-RISC by doing exactly what you describe.

Hope that explains my thinking better. Does that change at all your
view of the benefits that I might achieve from LLVM?

It doesn't change it in the sense that I think LLVM will work well for
this. JIT speed could be an issue but that will be amortized if the
opcode sequence is executed enough times.

One way to speed up the JIT is to pre-generate a set of instruction
templates for each opcode that get filled in with specific information
available at runtime. See the papers on DyC for some examples. I
believe the Dynamo folks also took this route. This would be quite
extensive work in LLVM but would be very valuable, I think. Partial
evaluation papers may also be useful to explore.

HTH.

                             -Dave