Caching ExecutionEngine / MCJIT

Hello everyone,

I need some advises about (re)using ExecutionEngine with MCJIT as a driver. I’m developing a service that receives a piece of high-level code, compiles it into LLVM IR function “main” and uses MCJIT to execute the function.

It can happen that the same piece of code is sent to the service many times. I would like to cache the results (keep generated machine code alive) and do just the execution step. But I’m having problems with that.

My first attempt was to cache ExecutionEngine instance and function address (from getFunctionAddress() method). Executing cached function second time crashes the process.

I noticed that doing getFunctionAddress() each time helps a bit. There is no crash but results produced by executed function are incorrect and there are probably some memory access violations going on.

Using the same function name for each code is probably a bad idea in case of MCJIT, so I changed the names to some random value. However, it did not help in any of previous problems.

I thinking about using single instance of ExecutionEngine or share Memory Manager. Can I get any advice on that?

Happy New Year,
Paweł Bylica

Hi Pawel,

I don't know much about MCJIT, but I did come across this blog post
http://blog.llvm.org/2013/08/object-caching-with-kaleidoscope.html
about caching the generated objects for subsequent execution, which
apparently is a big win for MCJIT over the old jit execution engine in
LLVM.

There's example code in the LLVM source distro
(examples/Kaleidoscope/MCJIT/
cached).

HTH,
Charlie.

Ah, sorry for the noise, I'm sure someone else here will have some advice!

Kind regards,
Charlie.

My suggestion would be to use a single long lived instance of EE and MM. Use some hashing mechanism to map your high level requests to a unique key. If you’ve already generated it, just reuse the existing code. Otherwise, create a new module, add it to the EE, and compile. This will cause you to “leak” memory for code that isn’t being reused. I don’t know of a good solution to that within the framework of MCJIT, but you can get something reasonable by simply recreating your EE and MM instances every N (10000?) compiles. Until you have something like that working, I won’t worry about trying to improve the memory caching strategy. p.s. You can also look at using the on-disk cache capabilities. I have never used that and have no idea how useful it is.

My response would be similar to some of the previous replies. In my experience with MCJIT, the best way is to have one instance of the ExecutionEngine. Everytime you create a new llvm IR Module, add to the MCJIT ExectionEngine using addModule . It would be a good idea to create unique function names everytime you create a new llvm IR Module (I dont know what happens if there is a clash).
With respect to hashing, I would implement hashing from your high-level code to the function generated outside of the MCJIT Execution Engine. Once you get the function name from your hash function you can get the function address using getFunctionAddress.

Thank you for all responses. I did as suggested and everything works as I wanted to.

There is however one small performance issue I’ve spotted. To check if a function has been JITed and is already in memory I need to use getFunctionAddress, and it does section relocations every time (via finalizeLoadedModules()). I would like to use getSymbolAddress from MCJIT but is not accessible by ExecutionEngine interface.