ORC JIT and multithreading

How would I go about enabling the ORC JIT to compile code in multiple threads?

Some background:
I’ve switched Clasp (an implementation of Common Lisp that uses llvm as the backend - github.com/drmeister/clasp) over to using ORC as its JIT. I did this by following the Kaleidoscope tutorial. I have a ClaspJIT_O class that copies the KaleidoscopeJIT class.
https://github.com/drmeister/clasp/blob/analyze/include/clasp/llvmo/llvmoExpose.h#L4370

Clasp is multithreaded and it needs to compile code in multiple threads because it uses the JIT to generate dispatch functions for multiple dispatch/generic functions. To make this possible, every thread gets its own LLVMContext and every type and llvm::Module that is linked into JITted code in each thread is initialized lazily and thread-locally. Despite this - I experience frequent, random crashes when I try to use the ORC JIT from multiple threads.

Here’s what I’ve tried:

(1) This works: wrap a lock/mutex around access to one ClaspJIT_O object, the calls to ClaspJIT_O::addModule and a call to jitFinalizeReplFunction are protected by the lock:
https://github.com/drmeister/clasp/blob/dev/src/llvmo/llvmoExpose.cc#L3999

The Common Lisp code that does the lock and calls these functions:
https://github.com/drmeister/clasp/blob/analyze/src/lisp/kernel/cmp/jit-setup.lsp#L598

This throttles the system and limits one thread at a time to add modules to the JIT and lookup symbols in the JIT. It’s not bad - I can live with it.

(2) This fails: Keep a thread local copy of a ClaspJIT_O object that is lazily initialized as soon as any compilation happens in a thread.

(3) This fails: Keep a thread local copy of a ClaspJIT_O object that is initialized as in #2 AND wrap a lock/mutex around ClaspJIT_O::addModule and a call to jitFinalizeReplFunction. What I thought I was testing here was if there was some global resource that ORC uses and despite having multiple thread-local ClaspJIT_O objects the different threads were trampling that common global resource.

I can provide many more details on request.

Christian Schafmeister

Professor, Chemistry
Temple University

Hi Christian,

ORC doesn’t have any locks internally at the moment. Approach (1) is the recommended solution. I’m working on a refactor that should be out in a few weeks that will improve threading support, and I expect to put more effort into multi-threaded performance in the next few months.

Out of interest, are you saying that approach (1) was a regression compared to MCJIT’s behavior?

Cheers,
Lang.

Hi Lang,

I’m excited to hear about an ORC refactor with improved threading support - I really like ORC.

I added multithreading to Clasp after I made the switch to ORC - so I don’t have any experience using MCJIT in a multithreaded environment.

Thank you for responding.

Best,

.Chris.

Hi Christian,

ORC doesn’t have any locks internally at the moment. Approach (1) is the recommended solution. I’m working on a refactor that should be out in a few weeks that will improve threading support, and I expect to put more effort into multi-threaded performance in the next few months.

What’s the shared state that necessitates (1)? Having a separate LLVMContext+separate ORC JIT stack (I think that’s what’s being suggested - in (2) for example), etc should suffice?
If a whole process can only have one ORC JIT stack that seems really problematic, and I’d be surprised if that were the case so I guess I’m missing something.