a life-cycle question for MCJIT


We use MCJIT to generate machine code in our LLVM based JIT compiler.
The code generation process has roughly 5 steps:

0. Generate and optimize LLVM IR.
1. Call generateCodeForModule on the output of (0) to translate LLVM
    IR to machine code.
2. Figure out the final locations for the code and data generated by
    MCJIT using an allocator specific to our runtime. Make
    mapSectionAddress calls to convey this information to MCJIT.
3. Call finalizeObject() to apply relocations.
4. Copy over the relocated code to buffers allocated by our custom

The problem:

After running step (1) we may, in rare cases, decide that the
generated code is not usable by our runtime [*], and we have to
"abort" the compile. However, step (1) populates
Dyld->ExternalSymbolRelocations (and possibly other similar data
structures) with the set of pending relocations, and the
ExecutionEngine interface provides no way of cleaning this up without
running (3). Since we use a single long-living instance of MCJIT per
compiler thread, when we abort the compile after running
(1) and before running (3) these relocations get applied to future
compiles and cause problems.

To get around this issue, is it reasonable to add a hook to the
ExecutionEngine interface that resets the state of an MCJIT (and
containing RuntimeDyld) instance from after (1) to before it?

A second potential solution is to "pretend" to run through steps (2)
and (3) to have MCJIT and RuntimeDyld clear their internal states; but
I'd prefer not going this route if it can be avoided.

[*]: why this happens is not important to this discussion, but it is
sufficient to note that a) we cannot reliably predict this before
running step (1) and b) there are no simple tweaks that will prevent
this from happening.

-- Sanjoy

This use case is probably also relevant to the PNaCl people - if the output of the JIT violates the SFI requirements of their sandboxing platform (possible due to a compiler bug) then they need to be able to abort. I think that they’re currently not lazily JITing, but given how important startup times are to them this is probably not ideal.


I can think of a few ways to tackle this.

(1) As you suggested, add a “remove loaded objects” method to MCJIT. I think this would be straightforward, but haven’t thought about it carefully yet.

(2) Replicate some of MCJIT’s functionality so that you can invalidate before running step one. E.g: You could call libCodeGen to produce an object file yourself, then scan it using libObject to perform your validation, then add the validated object to MCJIT using addObjectFile. The practicality of this option will depend on how much of MCJIT’s logic you need to reimplement to do your validation. If you need to validate memory layout it’ll be prohibitive, but if you just want to scan for prohibited instructions or calls it may be doable.

(3) Use multiple MCJIT instances chained together by a shared symbol resolver. This way you could just delete your newly added MCJIT instance if the IR doesn’t validate.

(4) Use Orc. A basic Orc stack looks a lot like MCJIT from outside, but under the hood it manages multiple RuntimeDyld instances, and allows you to remove them whenever you like. This is basically option (3) made easy.

  • Lang.