Function binding

Hey list,

I'm looking for information on how programs that span multiple LLVM
modules work at runtime, especially wrt. symbol handling when running
in a JIT setting. To give some background, I'm developing a language
that targets LLVM as a backend, and I'd like my translation units to
map to LLVM modules as closely as possible.

What I'm looking for here is something similar to how Java or Python
handles intra-module depencies at runtime, where they load modules (or
classes, in the Java case) as necessary, and where different modules
can cooperate during different runs of the same program depending on
the code path that is taken.

Is it possibly to get a hook call when a JITed module encounters a
symbol reference it can't resolve locally? My current solution is
based upon pessimizations that force the loading of all dependent
modules, but that's wasteful in many cases when only some of those
dependencies are actually required for execution.

That said, I would also like to examine the possibility to recompile
modules in the running system on the fly from source, so that it is
possible to update modules as longs as their interfaces stay
compatible. Can LLVM freeze the JIT in a safe place and unload
modules?

I'm also curious to find out how the external symbols referenced from
the C frontend are resolved (such as printf or other functions in
libc). I assume there is a dlsym() call somewhere depending on the
libs listed in the module, is this correct? Does this happen at module
load time or at some later point while executing?

Finally, is the LLVM linked really required for a system like this? I
know the JIT is happy executing my bytecode modules as long as they
are self-contained, but on-demand loading is a requirement for this
(test) project. Currently all I'm getting is a hard error from the
runtime complaining that a referenced symbol is undefined.

Any information (or pointers to information) on the above would be
very helpful to me.

Thanks,
Andreas

Hey list,

I'm looking for information on how programs that span multiple LLVM
modules work at runtime, especially wrt. symbol handling when running
in a JIT setting. To give some background, I'm developing a language
that targets LLVM as a backend, and I'd like my translation units to
map to LLVM modules as closely as possible.

Ok. Currently the LLVM JIT just knows about a single module. I think it would be very useful to extend this to support multiple modules at a time, where a function reference consults a symbol table to determine the right module to compile from.

In the context of C/C++, imagine completely skipping the link step. Instead of linking, you could just present the JIT with a list of .o files to load and execute. If it could execute from multiple modules at a time, it would "just work" as if linking had occurred.

What I'm looking for here is something similar to how Java or Python
handles intra-module depencies at runtime, where they load modules (or
classes, in the Java case) as necessary, and where different modules
can cooperate during different runs of the same program depending on
the code path that is taken.

I think this is another very logical application of this idea.

Is it possibly to get a hook call when a JITed module encounters a
symbol reference it can't resolve locally?

Yes, sort of. Look at lib/ExecutionEngine/JIT/Intercept.cpp. getPointerToNamedFunction contains logic that works like this:

1. If this is one of the very few functions the JIT knows about, handle
    it.
2. Otherwise, call 'dlsym' on the local process to resolve the address.
3. Otherwise abort.

It would be pretty straight-forward to extend that code, or the callers of that code, to search multiple modules.

My current solution is
based upon pessimizations that force the loading of all dependent
modules, but that's wasteful in many cases when only some of those
dependencies are actually required for execution.

Yup.

That said, I would also like to examine the possibility to recompile
modules in the running system on the fly from source, so that it is
possible to update modules as longs as their interfaces stay
compatible. Can LLVM freeze the JIT in a safe place and unload
modules?

Not really. However, it can do the equivalent thing: it can replace code for functions that have already been compiled with new code (see ExecutionEngine::recompileAndRelinkFunction). The semantics of this are the any future invocations of the function will call the newly compiled function. If there are any invocations of the function on the stack (currently executing) they will finish executing the old function. Any new calls into the function will get the new code (this is to avoid having the JIT have to keep track of potentially very expensive mapping information).

I'm also curious to find out how the external symbols referenced from
the C frontend are resolved (such as printf or other functions in
libc). I assume there is a dlsym() call somewhere depending on the
libs listed in the module, is this correct? Does this happen at module
load time or at some later point while executing?

Yup, see above. These happen lazily as the process needs the symbols. The address of 'printf' is inserted into the JIT's symbol table just like any JIT'd function's address.

Finally, is the LLVM linked really required for a system like this? I
know the JIT is happy executing my bytecode modules as long as they
are self-contained, but on-demand loading is a requirement for this
(test) project. Currently all I'm getting is a hard error from the
runtime complaining that a referenced symbol is undefined.

Currently, yes, it does require this. However, I think it would be great for the JIT to have a list of Module's that are currently 'open' that it can generate code for, and for this list to be dynamically mutable. Any help adding the functionality to the JIT would be greatly appreciated!

-Chris