dynamic namespacing of JIT modules?

Greetings, LLVM wizards!

We have an application that uses Clang and Orc JIT to compile and
execute C++ code on the fly.

The JIT contains any number of LLVM modules, each of which defines a
function, plus a "main" module that calls those functions. Several
functions may have the same signature, so I need to find a way to
resolve them.

Originally, I just put each module's code in its own namespace when it
was compiled. But now we want to be able to compile them separately to
bitcode files and read them later. So at compilation time there is no
longer any way to assign a unique namespace to each.

I have a couple ideas for possible solutions, but I'm hoping some LLVM
expert out there has a better one...

1. Assign each module a unique namespace when its bitcode is loaded:
go through the IR and add the namespace to the names of the functions
defined (and called, in the case of internal functions). I don't know
how to do that. Perhaps somebody else does?

2. Assign each module a unique namespace, but don't change the modules
themselves: just add the namespace when a function is called from the
main module, and modify the JIT's symbol resolver to strip the
namespace and look for the function only in the relevant module.

Help...?

Hi,

Greetings, LLVM wizards!

Not one of them...

We have an application that uses Clang and Orc JIT to compile and
execute C++ code on the fly.

The JIT contains any number of LLVM modules, each of which defines a
function, plus a "main" module that calls those functions. Several
functions may have the same signature, so I need to find a way to
resolve them.

Originally, I just put each module's code in its own namespace when it
was compiled. But now we want to be able to compile them separately to
bitcode files and read them later. So at compilation time there is no
longer any way to assign a unique namespace to each.

Why not? If you assign a random uuid, or a sequential number of
whatnot, that should work.

2. Assign each module a unique namespace, but don't change the modules
themselves: just add the namespace when a function is called from the
main module, and modify the JIT's symbol resolver to strip the
namespace and look for the function only in the relevant module.

That's kind of what I do for a similar-ish problem in the JIT engine in
postgres (which uses orcjit). There multiple dynamically loaded
extensions can register functions whose source code is available, and
each of them can have conflicting symbols. The equivalent of your main
module generates function names that encode information about which
module to look for the actual definition of the function, and then does
the symbol resolution outside of LLVMs code. I do that both when
inlining these functions, and when generating funciton calls to the
external function.

Not sure if that helps.

Greetings,

Andres Freund

Not sure if I’ve got your question right.
Do you want to “redirect” calls from the main module to different functions
from different modules? It would help to understand the problem if you
elaborate a bit.

As of this one:

1. Assign each module a unique namespace when its bitcode is loaded:
go through the IR and add the namespace to the names of the functions
defined (and called, in the case of internal functions). I don't know
how to do that. Perhaps somebody else does?

you can simply iterate over all the functions and change their names, e.g.:

   function.setName(“foobar” + function.getName())

Hi,

> Greetings, LLVM wizards!

Not one of them...

> We have an application that uses Clang and Orc JIT to compile and
> execute C++ code on the fly.
>
> The JIT contains any number of LLVM modules, each of which defines a
> function, plus a "main" module that calls those functions. Several
> functions may have the same signature, so I need to find a way to
> resolve them.
>
> Originally, I just put each module's code in its own namespace when it
> was compiled. But now we want to be able to compile them separately to
> bitcode files and read them later. So at compilation time there is no
> longer any way to assign a unique namespace to each.

Why not? If you assign a random uuid, or a sequential number of
whatnot, that should work.

Yes, that is the solution I am looking into at the moment, actually:
using a UUID to generate a namespace when the module is compiled.
However, that means saving the UUID somewhere; the bitcode is no
longer self-sufficient. I suppose I could create a special global
variable in the module containing the UUID...

> 2. Assign each module a unique namespace, but don't change the modules
> themselves: just add the namespace when a function is called from the
> main module, and modify the JIT's symbol resolver to strip the
> namespace and look for the function only in the relevant module.

That's kind of what I do for a similar-ish problem in the JIT engine in
postgres (which uses orcjit). There multiple dynamically loaded
extensions can register functions whose source code is available, and
each of them can have conflicting symbols. The equivalent of your main
module generates function names that encode information about which
module to look for the actual definition of the function, and then does
the symbol resolution outside of LLVMs code. I do that both when
inlining these functions, and when generating funciton calls to the
external function.

I did try something like that. The problem I ran into is that the
symbol resolver receives mangled function names. It is easy enough to
demangle them there, but hard to mangle names before compiling. Once
you have decoded your function name in the symbol resolver, how do you
generate a mangled name for the actual function you want to resolve
to?

Not sure if that helps.

Greetings,

Andres Freund

Thanks, Andres.

Well, I've gotten this to work by playing with the symbol resolver as
you suggest. Almost...

In the main module, I declare the functions in (fictitious)
namespaces. In the JIT, the symbol resolver recognizes those
namespaces, which tell it in which modules to look for the
corresponding unnamespaced functions. In a simple test case, that
works. But in a more complex case, execution fails when I try to run
the constructors for the main module. The error message says that the
namespaced functions from the main module were not found, so
apparently somebody somewhere is looking for those symbols and
bypassing the JIT's symbol resolver... Perhaps the linking layer?

I think I will go back to the UUID-based namespace idea, which would
be less of a headache because it doesn't involve LLVM...

Belatedly jumping in here, as there is a potential alternative answer for this in the newer iteration of the ORC APIs.

The new APIs replace symbol resolvers with first class symbol tables, “JITDylib” instances, which provide a way to namespace code so that duplicate names do not clash. (JITDylibs are also faster to search, and internally provide synchronization support for concurrent compilation).

Modules (and any other program representations) are always added to JITDylibs in the new API, and you control symbol resolution by describing “links-against” style relationships between JITDylibs the same way you would when building a program/library on the command line. You can also attach symbol definition generators to JITDylibs to generate new definitions programmatically if desired. I have included an example below that shows how to build a simple IR JIT that uses both techniques.

The JIT contains any number of LLVM modules, each of which defines a
function, plus a “main” module that calls those functions. Several
functions may have the same signature, so I need to find a way to
resolve them.

Originally, I just put each module’s code in its own namespace when it
was compiled. But now we want to be able to compile them separately to
bitcode files and read them later. So at compilation time there is no
longer any way to assign a unique namespace to each.

In this case, I believe that you could place each Module (or group of modules whose names are guaranteed not to clash) in its own JITDylib. You would use whatever disambiguation process you are using now to find the “correct” version of the function to find the “correct” JITDylib instead, and this would allow you to resolve correctly without modifying stored IR.

Cheers,
Lang.

Example code:

// Create a JITTargetMachineBuilder and DataLayout.

// We use a target machine builder rather than a single target machine as the new APIs are
// capable of compiling on multiple threads, though we do not do that in this example
auto JTMB = ExitOnErr(JITTargetMachineBuilder::detectHost());
auto DL = ExitOnErr(JTMB.getDefaultDataLayoutForTarget());

// Now we create an ExecutionSession (string pool, error reporting,
// session mutex), and object and IR compile layers.
ExecutionSession ES;
RTDyldObjectLinkingLayer ObjLayer(ES, { return llvm::make_unique(); });
IRCompileLayer CompileLayer(ES, ObjLayer, ConcurrentIRCompiler(JTMB));

// Now we get to the interesting part: We declare two JITDylibs. One,
// ProcessSymbolsLib, will auto-generate definitions by calling dlsym
// on the current process, making this process’s symbols available to
// JIT’d code.
// The second, Main, will contain our JIT’d code. We add a “links-against”
// relationship from Main to ProcessSymbolsLib by calling addToSearchOrder.
auto &ProcessSymbolsLib = ES.createJITDylib("");
ProcessSymbolsLib.setGenerator(ExitOnErr(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
auto &Main = ES.createJITDylib(“main”);
Main.addToSearchOrder(ProcessSymbolsLib);

// Now we can add code to the Main library, and perform a look up on it.
// ExecutionSession::lookup takes as its first argument a list of JITDylibs to search
// for the requested definition.
ExitOnErr(CompileLayer.add(Main, ThreadSafeModule(std::move(Mod), std::move(Ctx))));
auto FooSym = ExitOnErr(ES.lookup({&Main}, “_foo”));
auto Foo = (FooTy)FooSym.getAddress();
Foo();