I’m trying to speed up the JIT time with llvm (3.9.1).
So far i’ve implemented the object cache, used FastISel and disabled optimizations.
Jit time is still too slow for my purpose (I do have a lot of code to Jit).
http://llvm.org/docs/ProgrammersManual.html#threads-and-the-jit states that we can invoke ExecutionEngine::getPointerToFunction() concurrently. This function was replaced by ExecutionEngine::getFunctionAddress(). Is this function also thread safe?
I want to speed up codegen by invoking parallel calls to getfunctionaddress(). Currently due the the large amount of code that has to be Jitted, the getfunctionaddress() takes around 40% of my load time.
What is meant with “he user must still ensure that only one thread accesses IR in a given
LLVMContext while another thread might be modifying it” ?
If I pre-generate all IR, and before execution, invoke the parallel getfunctionaddress() I should be fine right ? Since IR won’t be modified anymore.
I can’t help with the rest, but just wanted to mention that totally disabling the IR optimizations is not necessarily a good idea depending on what does the input IR looks like: some of the optimizations can be “lightweight” and simplify the IR / delete dead core, making the compile-time actually faster.
Actually, The object cache holds the optimized IR (It’s precompiled with optimizations and persist it). There’s around of 10% of “on-the-fly” generated code that would go without the optimization. I Did test with optimization, and it was slower. For that 10% I’m okay with optimization being turned off.
I also only generate the function prototype for cached modules to reduce IR generation.
So I’ve made code to invoke the getfunctionaddress() in parallel. I did verify that the code was good, by substituting getfunctionaddress() with a bunch bogus computations.
It seems that the code with getfunctionaddress() is being serialized. Is there a giant lock somewhere per executionengine?
I have one execution engine that holds all the modules. Going through the llvm-dev list archives, it seems that I have to have a execution engine per module. Is this still the case ? (the posting were quite old). Is there a difference between mcjit and orc in this case?
I was hoping that by not modifying IR during getfunctionaddress() would work
Ok, so I’ve reworked it to have one jit/context per expression. It uses a lot of memory.
Is there a way to clone/get the jitted mem after getfunctionaddress() invocation ? If I oculd get the size, I could do a memcpy on the address, and delete the jit. This should save a lot of memory.