[ORC] Removing / replacing JITDylibs

Hi,

I’m working on a runtime autotuner that utilizes ORCv2 JIT (I’m closely tracking tip-of-tree), so linking new object files and patching in the new function(s) will happen frequently.

One of the concerns my runtime system has is the ability to do one of the following: (1) replacement of the contents of a JITDylib with a new object file [to provide semi-space GC-style reclaiming], (2) the outright removal of a JITDylib.

Right now, I have one ExecutionSession instance for my linker and am creating a new JITDylib for each object file that I’d like to link in. There doesn’t appear to be a corresponding ExecutionSession::removeJITDylib(…) method, so I’m wondering: how do I reclaim the memory for code that I’ve linked in previously but no longer need?

When using MCJIT, I would reclaim this memory by destroying the ExecutionEngine that was created for each “JITDylib”. Should I do the same with ExecutionSessions?

For reference, here’s the short bit of code I’m playing with for linking with ORC:
https://github.com/halo-project/llvm/blob/master/compiler-rt/lib/halomon/DynamicLinker.h#L55

Thanks,
Kavon

Hi Kavon,

Unfortunately we don’t have a good way to do this at the moment, short of maintaining multiple execution sessions (analogous to the way you maintained multiple ExecutionEngines).

I am working on initializer/destructor support that will allow us to perform the equivalent of dlopen/dlclose on JITDylibs. Once that support is available I think it would be a good fit for your use case. Unfortunately I think it is still a few months away.

Cheers,
Lang.

Thanks for the reply, Lang! I really appreciate the effort you’ve been putting into the JIT infrastructure and that you take the time to answer lots of questions about it on the mailing list.

I am working on initializer/destructor support that will allow us to perform the equivalent of dlopen/dlclose on JITDylibs.

In my case, the process is already running with one version of the program code, and another thread is hot-patching functions with dynamically recompiled versions (I’ve extended XRay for this). Thus, I actually want to avoid reinitializing or creating new versions of globals for the Dylib output by ORC and instead link with the ones in-process.

My plan for achieving this is to externalize all internal globals in the original bitcode before JIT compilation and hope that the ORC dynamic linker handles the rest automatically.

Cheers,
Kavon

Hi Kavon,

In my case, the process is already running with one version of the program code, and another thread is hot-patching functions with dynamically recompiled versions (I’ve extended XRay for this). Thus, I actually want to avoid reinitializing or creating new versions of globals for the Dylib output by ORC and instead link with the ones in-process.

Yep. That should be fine. In your use-case you will want to split your globals and functions into different modules, and create new JITDylibs with no initializers to hold any functions whose memory you would like to be able to reclaim later. Then you can dlclose those JITDylibs to reclaim the memory for the contained functions.

Actually, if my current prototype pans out then there might be an even better solution for your use case: I’m hoping to provide fine-grained removal of modules from within a JITDylib (without removing the whole JITDylib). The advantage of this is that it’s easier to reason about (functions can go in the conceptually “correct” JITDylib, even if you want to remove them later) and less expensive (It’s more expensive to maintain one JITDylib per function than to maintain one JITDylib with many functions).

In my case, the process is already running with one version of the program code…

My plan for achieving this is to externalize all internal globals in the original bitcode before JIT compilation and hope that the ORC dynamic linker handles the rest automatically.

Do you mean that you want to take the IR for the original version, turn the global definitions into declarations, then try to JIT it? This will work for globals that had external linkage in the original program, but may fail for local/private variables for two reasons:

(1) ORC will usually try to find the addresses of the external globals by calling dlsym (assuming you’re using the DynamicLibrarySearchGenerator), and this will not find internal/private symbols. If Xray maintains a side table of internal symbol addresses you could work around this by extending lookup to search the side table.

(2) The optimizers and linker are free to rename / remove / dead-strip private globals depending on how they are used. For example, on my machine the following results in an empty data section (Result gets SCCP’d away):

static int Result = 42;
int getResult() {
return Result;
}

So if you turned Result into a declaration and then tried to JIT the function at a lower optimization level you would get a missing definition error. In this case, one workaround might be to consult the side table from (1) (if you have it), and only turn the global definition into a declaration if the table indicates that there is an existing definition. If there isn’t you could promote your definition to extern linkage and have it serve as the definition going forward.

Cheers,
Lang.

Hi,

Unfortunately we don't have a good way to do this at the moment, short of
maintaining multiple execution sessions (analogous to the way you
maintained multiple ExecutionEngines).

FWIW, I/we use Orc v1, via the C stack, for compiling parts of SQL
queries. We currently rely quite heavily on being able to deallocate
"modules". I asume it'd be a problem for us to migrate to v2 due to that
right now (ignoring the fact that there's no C API right now ...).

I have not yet benchmarked it, but it looks like it'd be fairly
noticable to create a separate ExecutionSession for every set of
functions I'd want to be able to deallocate inidividually. And it's not
straightforward to just use ExecutionSessions in a "generational"
manner, as the lifetime of JITed expressions can vary widely (from
seconds to many days in the case of a longrunning query, which might
spawn many shorter queries internally).

Is that prototype available somewhere?

I'd be interested in replacing the current C stack with LLJIT (wrapping
it in C again, potentially even with a roughly compatible interface),
but especially if one desired to take advantage of the nicer features,
say parallel compilation, it looks infeasible to just create separate
LLJIT instances over and over.

I am working on initializer/destructor support that will allow us to
perform the equivalent of dlopen/dlclose on JITDylibs. Once that support is
available I think it would be a good fit for your use case. Unfortunately I
think it is still a few months away.

Actually, if my current prototype pans out then there might be an even
better solution for your use case: I'm hoping to provide fine-grained
removal of modules from within a JITDylib (without removing the whole
JITDylib). The advantage of this is that it's easier to reason about
(functions can go in the conceptually "correct" JITDylib, even if you want
to remove them later) and less expensive (It's more expensive to maintain
one JITDylib per function than to maintain one JITDylib with many
functions).

Interesting. Are you thinking of providing both, wholesale JITDylib and
excision of individual functions? In postgres' case the modules we add
contain a bunch of exposed function, and then (depending on the cost of
the query) a lot of C functions and required globals (copied in the case
of static constant ones) that have been "inlined" from the main binary,
to allow for proper IPO. It seems like it might be hard to support that
case efficiently without replacing JITDylibs wholesale?

The new APIs look much cleaner to me. Thanks for all your work on that!

Greetings,

Andres Freund