Using LLVM to generate x86 dynamically in memory

Hi. I’m very new to LLVM, and have read some of the documentation online. Before I invest too much time, however, in learning about LLVM, I’d like to know if it can be used for my purpose. We currently have a critical runtime component that evaluates expressions via interpretation. The bytecode that we generate goes through various optimization phases similarly to that of optimizing language compilers (although definitely not as complete) before being evaluated at runtime. For quite some time now we have been thinking about getting away from interpretation all-together and generating native code directly.

What I’d like to know is:

  1. Is there an LLVM backend library that can take either a) a C source program in memory or b) an LLVM program in memory, as input and generate x86 instructions in memory?

  2. HOw “light-weight” would a mechanism like this be? I suppose that depends on what optimizations we decided to apply during this code-gen phase.

What we plan to do is invoke this in-memory compiler to generate the x86 instructions, and then ship them at runtime to our new expression evaluation engine that will simply set a function pointer to it and execute.

I heard of a compiler called TinyCC that basically has a library (libtcc) that can be invoked at runtime to generate x86 instructions directly from a C program stored in memory. That’s what I would like, but with the aggressive opts provided in the LLVM infrastructure. Thanks so much for your time…

  • Shasank

Hi. I'm very new to LLVM, and have read some of the documentation online. Before I invest too much time, however, in learning about LLVM, I'd like to know if it can be used for my purpose. We currently have a critical runtime component that evaluates expressions via interpretation. The bytecode that we generate goes through various optimization phases similarly to that of optimizing language compilers (although definitely not as complete) before being evaluated at runtime. For quite some time now we have been thinking about getting away from interpretation all-together and generating native code directly.

What I'd like to know is:

1) Is there an LLVM backend library that can take either a) a C source program in memory or b) an LLVM program in memory, as input and generate x86 instructions in memory?

LLVM does support a Just-In-Time (JIT) compiler interface, which is what it sounds like you're looking for. I would suggest looking more deeply into the llvm documentation at llvm.org, paying particular attention to JIT references and the lli utility.

2) HOw "light-weight" would a mechanism like this be? I suppose that depends on what optimizations we decided to apply during this code-gen phase.

It does indeed vary quite a bit. I suspect your best bet will be to create some test programs, or just use some of the examples from the llvm source tree, and look at the footprint and compare to what your needs are.

Hi Jim. Thanks for your speedy response. I’m not entirely sure if a JIT is what I’m looking for. I’m basically looking for a dll with an interface that takes a C program as input and compiles and optimizes it to native x86 instructions in an in-memory buffer. I don’t want the dll the execute it, and I don’t particularly want to translate our expressions into LLVM bitcode (although I can if the rest of the pieces are there). Also, I briefly read up on lli. This looks like a separate process will have to be spawned to invoke the JIT to execute programs in LLVM bytecode. This will definitely incur an overhead penalty that we wouldn’t want to pay. Thanks in advance for your response.

  • Shasank

BTW - I was doing some google searching just now on the topic again, and i was surprised to see how fast google had already found the log of this previous conversation recorded at uiuc. ;).

This can be done. Last year I took clang (v. 2.5) and hacked the compiler driver into a library which would JIT C source to memory.
I then created a Zend extension which utilized the library. No external frameworks were used in these hacks, and were fairly simple
to implement once one knew how to use the Clang/LLVM libraries.

I cannot speak to your dll requirement, as the work was done for Linux. In addition I never measured the overhead in terms of a final
accumulated library size (size of all necessary libraries). No other measurements were taken either.

Garrison

Thanks Garrison. I just read up a little on Clang - the website seems to indicate that the source is simple and easy to modify for this purpose. Have any of you used TinyCC (http://bellard.org/tcc/) before? It seems to do exactly what I would like, but 1) I haven’t heard of it’s use in industry as much as I have of LLVM and 2) I don’t know how well supported it is. Thanks again.

  • Shasank

Compiling code to native code in an in-memory buffer is really all a JIT is.

If you're attached to writing the definitions of each opcode in C,
here's an old idea in JIT compilation. For every opcode in your
bytecode, write a corresponding C function that takes relevant
parameters and implements the opcode. Mark each action as
__attribute__((always_inline)) Compile this C file with clang to a
.bc. Load that module from disk at runtime into the JIT. For each
bytecode string you want to execute, translate it from bytecode to
LLVM IR (with IRBuilder) that simply calls the opcode action
functions. The run an inlining optimization pass to inline all the
actions, and ask the JIT for a pointer to the function. Then you can
call it like a C function pointer.

lli is just a driver. It's source code is an example of how you would
use LLVM to embed a JIT into your program.

Reid

Thanks Reid. What you wrote makes a lot of sense. The more I learn about the LLVM, the more I’ll be able to determine how easy it would be to translate our IR into it. If it proves too difficult, I’ll do what you suggested. Thanks again everyone for your responses…

  • Shasank

Hello,

Reid and others already addressed most everything, so I won't rehash it, except to add that translating to LLVM IR is a requirement for using any of the llvm optimization passes. You don't have to output it to a bitcode file, of course, but putting it into the IR form is necessary.

-jim

In particular, I recommend looking at the Kaleidoscope mini language:
http://llvm.org/docs/tutorial/ as it does pretty much exactly what you
are looking for, and the tutorial goes through all the steps needed to
produce the working program.