Suggestion for VM porting to LLVM

Hi guys,

first of all let me say "hello" to everyone. Its the firt message for me in this list and I'm pretty happy that I'll be able to finally start working with LLVM.

I've a question for you all if you can help me: a few years ago I've implemented a simple VM for a language we're using in my company. The VM JIT compiles a simple bytecode and executes it. It works quite well, better then some well know VMs. My idea now is to try to migrate the VM to LLVM to see if I can gain something in terms of speed and extensibility.
I'm totally new to LLVM, but as far as I understood thare might be two ways to add LLVM to my project:

- generate IR from my complier;
- build a VM that converts at runtime my bytecode to IR and executes it;

I largely prefer the second option because it would be great if I can continue to use to old programs without having to recompile them.

Which approach do you suggest ? are there any alternatives ?

Gabriele

My take, do the second one first, gain some experience and have some fun. You can use this to double check the performance and suitability. But mid-term, I'd say, do both. This allows you to compare the performance of each solution against the other, compile time, compile time memory pressure, run time and run time memory usage. You're then in a better position to decide which path suits what usage styles you have. Long term, you can then see if it makes sense to trim a solution out.

I've done something similar with HLVM, albeit for a new VM that uses a new
representation. I found the combination of OCaml and LLVM to work extremely
well. OCaml makes it very easy to manipulate programs and its LLVM bindings
make it very easy to JIT compile and execute native code. My entire VM
(including GC) is only 1kLOC.

Hi Jon,

I've read your articles about HLVM, and that was one of the reasons that convinced me to try out LLVM. Actually my VM has been implemented in C++, but as long as it is not extremely complex, I might port it to OCaml that sounds more compact for this kind of programs. Did you find any significant performance loss when using OCaml over C++ ? Runtime performance is quite important in my situation.

Gabriele

Hi Mike,

thanks for the suggestion. Do you know if there are any articles around that explains how to use LLVM to build a VM that should work like mine ?
I've read something around (mostly source code) but a good article/doc will be perfect

Gabriele

There are a series of articles in the OCaml Journal describing the
construction of a VM using LLVM:

http://ocamlnews.blogspot.com/2009/03/building-virtual-machine-with-llvm-part.html

The VM uses expression trees rather than bytecode. However, there is another
article in the same journal describing the construction of a bytecode
compiler using LLVM:

http://ocamlnews.blogspot.com/2008/09/writing-bytecode-compiler-using-llvm.html

I have not benchmarked it because I have no C++ alternative but OCaml is
generally several times faster than C++ at symbolic processing (like
compilers).

The main disadvantage of using OCaml is that the bindings are incomplete.
However, they are almost complete and you can easily augment them with
anything that you need. See the "llvm.cpp" and "llvm_stubs.c" files in HLVM,
for example. I also had to work around some bugs in the LLVM bindings when
building HLVM but I'll happily talk you through it.

I'm sure you'll have something suitably spectacular in a short time. :slight_smile:

Hello, Gabriele

thanks for the suggestion. Do you know if there are any articles
around that explains how to use LLVM to build a VM that should work
like mine ?
I've read something around (mostly source code) but a good article/doc
will be perfect

Have you read the 'Kaleidoscope' tutorial?

Hi,

Isn't it intended to explain how to build a compiler for a custom language that targets LLVM-IR ? Is it useful also to understand how to build a VM that is meant to execute custom bytecode (converting it to LLVM-IR previously) ?

Thanks,
Gabriele

Gabrielle,

The way I see it, its pretty much the same thing… Conversion to LLVM-IR of a custom bytecode is the same as conversion to LLVM-IR of a custom language.

The syntax of the ‘custom language’ just happens to be binary bytecode.

Well, you are right :slight_smile:

In fact I’ve started porting the VM in the spare time and it is working fine. I’m still having some issues to understand the garbage collector, but I’ll delve more into it as soon as the other features will be complete

Gabriele

Good luck.

Be sure to document anything you think might be useful on the wiki!

http://wiki.llvm.org