Memory usage with MCJit

Hi all,

I’m building a runtime that can jit and execute code. I’ve followed the kaleidoscope tutorial and had a couple of questions. Basically I have a pre-compiler that compiles the code to cache objects. These cached objects are then persisted and used to reduce Jit compile time.

  1. I took the approach of 1 execution engine with multiple modules (I’m not removing modules once they have been added). During profiling, I noticed that the memory usage is high with a lot of code. How can I reduce the memory usage? Is one execution engine per module preferred? I would imagine that this would take up more memory.

  2. When I add a module and jit it, can I invoke a remove module and still be able to execute the compiled function?

  3. Is it better to have one runtime mem manager that will be associated with multiple execution engines (one engine per module)? Using this approach I could throw away the execution engines once they have jitted the code. I probably need to take ownership of the jitted code somehow. Are there any examples available on how this could be achieved?

cheers,

+Lang for JIT things.

Hi Koffie,

I’d highly recommend switching to ORC from MCJIT. It is much more flexible when it comes to memory-management.

  1. I took the approach of 1 execution engine with multiple modules (I’m not removing modules once they have been added). During profiling, I noticed that the memory usage is high with a lot of code. How can I reduce the memory usage? Is one execution engine per module preferred? I would imagine that this would take up more memory.

Whether or not you need multiple ExecutionEngines depends on your use case, but for non-trivial use-cases yes: One ExecutionEngine per Module seems to be required.

  1. When I add a module and jit it, can I invoke a remove module and still be able to execute the compiled function?

You can remove the module using ExecutionEngine::removeModule(M), but be aware that this will clear the global mappings associated with the Module, so any interfaces that use IR will no longer function.

  1. Is it better to have one runtime mem manager that will be associated with multiple execution engines (one engine per module)? Using this approach I could throw away the execution engines once they have jitted the code. I probably need to take ownership of the jitted code somehow. Are there any examples available on how this could be achieved?

The MemoryManager owns the JIT’d code. As long as it is alive the code will hang around. You will need to keep the symbol mappings if you need them.

If you switch to ORC this tends to be easier: You have one stack and can add as many modules to it as you like. The type of Module pointer that you add determines the ownership semantics: Pass a raw pointer and the JIT does not take ownership. Pass a std::unique_ptr and the JIT does own the IR. Pass a shared_ptr and it’s shared. The JIT will release the IR as soon as it is able to.

ORC also frees the relocatable objects as soon as possible (which MCJIT does not do), so it’s likely to have better memory performance over all.

  • Lang.

Hi Lang,

I’m at nearing release deadline, so switching at this time is probably not a good idea. Unless Orc and MCJit are almost api compatible. Are there any docs available for making the transition ?

With regards to the multiple ExecutionEngine: I have a runtime that can accept code that is known in advance (about 90% of the code). I’m able to cache the object to reduce jit time. For now I’m using one execution engine with multiple modules.

What benefits would I gain with an execution engine per module ? I think this will be harder and more memory expensive, since code from different module can invoke each other.

So MCJit is known for keeping memory used I assume (relocatable objects)? I will try to figure out which part of MCJiT is using the memory. Any pointers to keep at the back of my head during this ?

Hi Koffie,

I’m at nearing release deadline, so switching at this time is probably not a good idea. Unless Orc and MCJit are almost api compatible. Are there any docs available for making the transition ?

Orc’s functionality is a superset of MCJIT’s and it’s generally an easy transition, however they’re not API compatible: ORC doesn’t derive from ExecutionEngine. If/when you want to try porting your code I would recommend checking out the new tutorial series http://llvm.org/docs/tutorial/BuildingAJIT1.html . There are also C bindings for the new API (http://llvm.org/docs/doxygen/html/OrcBindings_8h.html) but it sounds like you’re happy with the C++ API.

With regards to the multiple ExecutionEngine: I have a runtime that can accept code that is known in advance (about 90% of the code). I’m able to cache the object to reduce jit time. For now I’m using one execution engine with multiple modules.
What benefits would I gain with an execution engine per module ? I think this will be harder and more memory expensive, since code from different module can invoke each other.

Some use-cases require multiple engines, but if yours doesn’t I see no advantage in switching unless you use a different memory manager for each engine: If you use a different memory for each engine you can discard individual engines after you’re done with them, freeing the underlying memory.

So MCJit is known for keeping memory used I assume (relocatable objects)? I will try to figure out which part of MCJiT is using the memory. Any pointers to keep at the back of my head during this ?

MCJIT keeps the relocatable objects alive, yes. It will also keep the IR alive unless you explicitly remove it using removeModule.

You might be able to figure out a way to deallocate the objects by adding new API, but I’d recommend putting that effort into porting to ORC instead - you’ll get the memory benefits natively, plus the new features that are in the works.

  • Lang.

Hi Lang,

Thanks for the link. I will study it.

So just for my understanding. When you say ORC is a super set of MCJiT, it means that ORC also support object caching right ?

My application is a calculation model environment. There’s a library of calculations that are user defined and pre-compiled. A typical calculation can consist of a couple of millions statements. So they are quite “big”. Users can input/override values and trigger calculations. (consider it as a really really big excel sheet).

My Ideal situation would be something like:

  1. seed model with cached objects
  2. for every set of statements associated with a module. Add the module
  3. Jit and remove the module, save the function ptr outside, for invocation.

But there’s like 10% of variable code that can invoke previous defined functions in the modules. So I could not remove the module after jit right? The IR probably needs to be kept alive right?
When I tried this with MCJiT, my address seems to be invalid after remove module, does ORC works differently in this case?

During runtime, JiT time is quite important, so during runtime, no optimization is applied to speed up JIT time. Are there other “tricks” that can be applied to speed up JIT speed?