Mull JIT Design

Hi Alex,

I’m replying to this on a new thread so as not to take the LLVMContext: Threads and Ownership discussion too far off topic.

If you did want to fit your use case in to ORC’s model I think the solution would be to clone the module before adding it to the compile layer and (if desired) save a copy of the compiled object and pass a non-owning memory buffer to the linking layer.

That said, if you are not using lazy compilation, remote compilation, or concurrent compilation then using ORC would not buy you much.

We also parallelize lots of things, but this could also work using Orc given that we only use the ObjectLinkingLayer.

In case it is of interest for your tool, here’s the short overview of ORC’s new concurrency support: You can now set up a compiler dispatch function for the JIT that will be called whenever something needs to be compiled, allowing the compilation task to be run on a new thread. Compilation is still triggered on symbol lookup as before, and the compile task is still opaque to the JIT so that clients can supply their own. To deal with the challenges that arise from this (e.g. what if two threads look up the same symbol at the same time? Or two threads look up mutually recursive symbols at the same time?) the new symbol table design guarantees the following invariants for basic lookups: (1) Each symbol’s compiler will be executed at most once, regardless of the number of concurrent lookups made on it, and (2) either the lookup will return an error, or if it succeeds then all pointers returned will be valid (for reading/writing/calling, depending on the nature of the symbol) regardless of how the compilation work was dispatched. This means that you can have lookup calls coming in on multiple threads for interdependent symbols, with compilers dispatched to multiple threads to maximize performance, and everything will Just Work.

If that sounds useful, there will be more documentation coming out in the next few weeks, and I will be giving a talk on the new design at the developer’s meeting.

Cheers,
Lang.

If you did want to fit your use case in to ORC's model I think the solution would be to clone the module before adding it to the compile layer and (if desired) save a copy of the compiled object and pass a non-owning memory buffer to the linking layer.

Yes, this is understood. The other reason we decided to go with a custom solution is a willingness to use different LLVM versions without much maintenance burden.

That said, if you are not using lazy compilation, remote compilation, or concurrent compilation then using ORC would not buy you much.

We still use concurrent compilation: we create one SimpleCompiler and one TargetMachine per thread and then distribute compilation of N modules across all available threads, then we gather all the object files and feed them into a JIT engine for further execution.

I also wanted to use lazy compilation to inject mutants, but we also decided to go another way. Current lazy JIT implementation makes it hard (at least for me) to reuse or adapt to other needs.

This is how it works now: let's say we have 3 modules A, B, C. We create several mutants out of C: C1, C2, C3, etc, where each C' has a mutation applied. Then, for each execution of mutants the JIT engine is fed with {A, B, C1}, {A, B, C2}, {A, B, C3}, and so on. It works very well, but adds too much overhead because JIT needs to resolve (the same) symbols several times.
This is what we decided to use at the end: instead of cloning original module we create a copy of a function and apply mutation on the copy. Then, the original function is replaced with an indirect call, the indirection is controlled outside of the JIT engine via pointer manipulation. Here is an example:

Before:

define foo(i32, i32) {
   //// original instructions
}

After:

foo_ptr = null

define foo(i32 %1, i32 %2) {
   ptr = load foo_ptr
   call ptr(%1, %2)
}

define foo_original(i32, i32) {
   //// original instructions
}

define foo_mutant_1(i32, i32) {
   //// mutated instructions
}

define foo_mutant_2(i32, i32) {
   //// mutated instructions
}

Once the object files are loaded and symbols resolved we patch the foo_ptr to point to the original function (foo_original), and then iterate over all mutants and change foo_ptr accordingly.

This approach also works quite well and saves lots of time: ~15 minutes instead of ~2 hours for mutation analysis of LLVM's own test suite.

I still think we can achieve the same with ORC, but its constant evolution makes it challenging: adapting our solution for each new API is time consuming.

If that sounds useful, there will be more documentation coming out in the next few weeks, and I will be giving a talk on the new design at the developer's meeting.

I think it does sound useful, but the documentation would be essential here. I tried to construct a simple JIT stack using new APIs, but I could not because of its complexity.
I see that there is a great idea behind all the abstractions, but I could not grasp it in one hour.

Also, I think it's worth mentioning: our simple JIT stack does not work with LLVM modules, which completely eliminates the ownership issues.
The user of JIT takes care of compilation and also takes care of lifetime of both modules and object files.
Though, this probably won't work for the use cases you want to cover.

At this moment I would focus on the underlying implementation (RuntimeDyld and friends), because there are few bugs and missing parts I'd like to address.