Hi Revital,
What do you mean by “code cache”? Orc (and MCJIT) does have the concept of an ObjectCache, which is a long-lived, potentially persistent, compiled version of some IR. It’s not a key component of the JIT though: Most clients run without a cache attached and just JIT their code from scratch in each session.
Recompilation is orthogonal to caching. There is no in-tree support for recompilation yet. There are several ways that it could be supported, depending on what security / performance trade-offs you’re willing to make, and how deep in to the LLVM code you want to get. As things stand at the moment all function calls in the lazy JIT are indirected via function pointers. We want to add support for patchable call-sites, but this hasn’t been implemented yet. The Indirect calls make recompilation reasonably easy: You could add a transform layer on top of the CompileCallbackLayer which would modify each function like this:
void foo$impl() { void foo$impl() {
// foo body → if (trigger_condition) {
} auto fooOpt = jit_recompile_hot(&foo);
fooOpt();
}
// foo body
}
You would implement the jit_recompile_hot function yourself in your JIT and make it available to JIT’d code via the SymbolResolver. When the trigger condition is met you’ll get a call to recompile foo, at which point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been configured with a higher optimization level, (2) look up the address of the optimized version of foo, and (3) update the function pointer for foo to point at the optimized version. The process for patchable callsites should be fairly similar once they’re available, except that you’ll trigger a call-site update rather than rewriting a function pointer.
This neglects all sorts of fun details (threading, garbage collection of old function implementations), but hopefully it gives you a place to start.
Regarding laziness, as Hal mentioned you’ll have to provide some target support for PowerPC to support lazy compilation. For a rough guide you can check out the X86_64 support code in llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
There are two methods that you’ll need to implement: insertCompileCallbackTrampoline and insertResolverBlock. These work together to enable lazy compilation. Both of these methods inject blobs of target specific code in to the JIT process. To do this (at least for now) I make use of a handy feature of LLVM IR: You can write raw assembly code directly into a bitcode module (“module-level asm”). If you look at the X86 implementation of each of these methods you’ll see they’re written in terms of string-streams building up a string of assembly which will be handed off to the JIT to compile like any other code.
The first blob that you need to be able to output is the resolver block. The purpose of the resolver block is to save program state and call back in to the JIT to trigger lazy compilation of a function. When the JIT is done compiling the function it returns the address of the compiled function to the resolver block, and the resolver block returns to the compiled function (rather than its original return address).
Because all functions share the same resolver block, the JIT needs some way to distinguish them, which is where the trampolines come in. The JIT emits one trampoline per function and each trampoline just calls the resolver block. The return address of the call in each trampoline provides the unique address that the JIT associates with the to-be-compiled functions. The CompileCallbackManager manages this association between trampolines and functions for you, you just need to provide the resolver/trampoline primitives.
In case it helps, here’s what the output of all this looks like on X86. Trampolines are trivial - they’re emitted in blocks and proceeded by a pointer to the resolver block:
module asm “Lorc_resolve_block_addr:”
module asm " .quad 140439143575560"
module asm “orc_jcc_0:”
module asm " callq *Lorc_resolve_block_addr(%rip)"
module asm “orc_jcc_1:”
module asm " callq *Lorc_resolve_block_addr(%rip)"
module asm “orc_jcc_2:”
module asm " callq *Lorc_resolve_block_addr(%rip)"
…
The resolver block is more complicated and I won’t provide the full code for it here. You can find it by running:
lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>
and looking at the initial output. In pseudo-asm though, it looks like this:
module asm “jit_callback_manager_addr:”
module asm " .quad 0x46fc190" // ← address of callback manager object
module asm “orc_resolver_block:”
module asm " // save register state."
module asm " // load jit_callback_manager_addr into %rdi
module asm " // load the return address (from the trampoline call) into %rsi
module asm " // %rax = call jit(%rdi, %rsi)
module asm " // save %rax over the return address
module asm " // restore register state
module asm " // retq"
So, that’s a whirlwind intro to implementing lazy JITing support for a new architecture in Orc. I’ll try to answer any questions you have on the topic, though I’m not familiar with PowerPC at all. If you’re comfortable with PowerPC assembly I think it should be possible to implement once you grok the concepts.
Hope this helps!
Cheers,
Lang.