Compile-time evaluation of functions

One of the things that my compiler frontend has is an interpreter, which allows fairly elaborate constant expressions to be folded into simple constants. This is particularly useful in the case of annotations that take arguments - rather than executing a static constructor before main(), I allow the constructor to run at compile time (if it can), generating a constant struct with all of it’s members pre-filled in.

Currently this is implemented as a tree-walk on my high-level CFG. However, it would simplify things if I could instead interpret the generated LLVM IR. For one thing, this would allow the compiler to import class definitions directly from bitcode files. (I already serialize the high-level type information as metadata nodes, but I’d like to avoid serializing the high-level function bodies as well.)

Rather than writing my own interpreter for LLVM IR, I’d like to investigate the possibility of using the interpreter facilities of ExecutionEngine. I definitely don’t want to use the JIT, as these methods are probably only going to be executed once.

There are a number of issues to be resolved before this can work. The biggest is that the compile-time environment and data types are totally different from their runtime counterparts. In the compiler, a struct instance is represented as an array of constant expression nodes, rather than as a flat region of memory. I’d need to redefine the GEP, load, and store operations of the interpreter at a very minimum.

I’m also assuming that the interpreter would require all input values to be converted into IR Value* types - I wouldn’t expect the interpreter to be able to operate on my frontend’s Expr* nodes. But that means that I would need a method for converting Value* constants back into Expr* nodes after the execution was complete. Since I have complete static type information for the output value, this isn’t as onerous as it sounds.

I’d also need some way to prevent the interpreter from going into an infinite loop, or infinite recursion - probably a hard-coded limit on the number of instructions executed and call-frame depth.

Also, looking in the SVN repo, it appears that some of the files in include/llvm/ExecutionEngine have not been touched in a long time, and that scares me a bit - what’s the current state of the LLVM interpreter?

Anyway, my main question to the list is - does this sound like a viable plan, or am I heading down the path of madness and frustration?

The current interpreter works on llvm IR, but in general invokes the JIT instead of just interpreting. I don't know that anyone has tested the interpreter path in some time, but it should largely just work. The JIT itself is undergoing a bit of rewriting for MC, but most of the idea is to leave the existing API intact if at all possible.

It sounds like you're not completely lowering to llvm IR though? That you want to change that is a bit of a source of worry, what do you have in mind? I read your email, but a more concrete example would help.



Sure thing. Here’s an imaginary example - say we want to implement something like LLVM’s CommandLine library, only using reflection. So we’ll define a class containing some variables, and annotate those variables which correspond to command-line parameters:

class CommandArgs {
@ProgramArgument(“input”, “i”, “the input file”)
var inputFile:String;

@ProgramArgument(“output”, “o”, “the output file”)
var outputFile:String;

When the program starts up, we’ll create an instance of CommandArgs and pass it to the argument parser. The parser looks up the reflection information for the object, and searches for members that have a ProgramArgument annotation. It then populates those member fields by mapping the input command arguments onto the annotated members.

Now, ProgramArgument is just a regular class that has a constructor that takes several arguments - in this case, the long name, short name, and help text. In fact it looks like this:

class ProgramArgument {
private {
let longName:String;
let shortName:String;
let helpText:String;

def construct(longName:String, shortName:String="", helpText:String="") {
self.longName = longName;
self.shortName = shortName;
self.helpText = helpText;

OK so when we compile the class CommandArgs, we want to save out the annotations on the class, along with the class metadata. The class metadata is made up of LLVM Constants, so the annotations need to be Constants as well.

So basically what I want to do here is to evaluate the ProgramArgument constructor in the compiler, and then convert the output into LLVM constants. The way I currently do this is that I have an interpreter inside my compiler that operates on my high-level CFG graph - given an expression tree, it can produce the result of evaluating that expression. In the case of the ProgramArgument annotation, what the evaluator gives back is a data structure that is essentially an array of 4 constants, the first being a pointer to the ProgramArgument type and the remaining 3 being the contents of the data members of class ProgramArgument. This data structure can easily be transformed into LLVM constants, and those constants inserted into the module before it is written out as a bitcode file.

However, there’s a drawback to this method which is that the module now contains two versions of every function - one version in the form of LLVM IR instructions, and another version in the form of a serialized high-level CFG. It would be better if I could just store the IR and use that.

The problem with using the IR, however, is that the compiled IR code for ProgramArgument.construct() expects to see the ‘self’ variable pointing to a flat buffer of memory, and not an array of llvm Value* pointers. Storing the value of the Nth member field is not done by writing to the Nth element of an array, instead it is done by computing an address that is an offset from the start of the object.

Moreover, I can’t serialize this buffer and add it as a constant static global, instead it has to somehow be converted back into LLVM Constants before I can do that. Bare integers need to converted back to llvm::ConstantInt, bare floats to llvm::ConstantFP, and so on. Only then can it be serialized as bitcode.

So it sounds like if I want to be able to run LLVM IR in the compiler, I’m going to have to write my own interpreter.


It sounds like what you want is symbolic execution of the LLVM IR - such as that provided by Calysto static checker ( I’m not sure if that particular implementation would work for you, but finding some prebuilt system will probably be significantly easier than rolling your own - there are a lot of edge cases.