(Very) small patch for the jit event listener

Hi Gaël,

I'm not familiar enough with the details of the old JIT engine and its event interface to comment on whether or not your changes are appropriate, but I'm not sure anyone is so the patch is probably OK as is. I don't see any obvious problems with it.

However, your description of the changes raises a bigger issue in my mind. I'm not sure if you are aware of this, but we're planning to deprecate the old JIT engine in a future release -- possibly as soon as LLVM 3.5. In order to do so we need to make sure the MCJIT engine is capable of meeting the needs of current JIT users, and I'm not sure we've got your case fully covered yet.

Can you tell me a little bit more about the details of how you are using the JIT engine? I'm putting together a document describing various models for MCJIT use and if your model isn't covered by one of the cases I've got now I'd like to add it.

Also, have you looked at the recently added Stackmap and Patchpoint intrinsics. Without knowing a lot about either your case or those intrinsics, I think that there may be a possible match there. The thing that raised a red flag for me in your message was that MCJIT doesn't maintain mappings between the generated code and the LLVM classes from which it is produced, so we'll probably need a different way to handle your safepoints.

(BTW, it's probably appropriate to move further discussion to the LLVMDev list rather than llvm-commits.)

Thanks,
Andy

Hi Andy,

We had previous discussions about this, I’d like to state more exactly what features would make MCJIT a replacement for the JIT.
After putting significant effort trying to move to MCJIT, I’m currently back with the JIT. This is in a REPL environment where functions are added and removed dynamically and response time is important. The issue is the legacy JIT provides great flexibility for this use case which is currently missing from MCJIT because of their very different design and goals.

With JIT, you can modify Function(s) in an already-compiled Module, unload the machine code and the JIT will automatically recompile and relink the function next time it is called. To make MCJIT work like that it would need at least :

  1. Automatic module splitting into function-modules.

  2. Module delete: from module list, from linker namespace, machine code unload, unregister EH and debuginfo.

  3. Stub functions.

  4. Relinking with stub functions so that new modules are relinked without changing already-finalized modules. This is critical to response time as you may change just one function out of 1000.

  5. Module addition should register EH and debuginfo (this is not done with current JIT but while at it…).

REPL environments using the LLVM JIT would likely encounter great difficulty moving to the current MCJIT without the above. 1) could be done by the programmer but the a helper function should provide this service. 2)-4) could be done only in the MCJIT. 5) is a bonus.

Until MCJIT has this kind of flexibility, I hope the JIT would be kept alive.

Yaron

Hi Andrew, hi all,

I already saw that the old jit was (almost) deprecated. So, I'm
currently playing with the new jit and it's look very interesting.
(I'm working locally and I haven't pushed anything new on VMKit
because I'm also changing a little the design vmkit). For the moment,
MCJIT does not work with VMKit (but I haven't yet tested the
safepoint/stackmap patch), I don't know if it comes from what I'm
doing or if something is still missing in MCJIT.

Sorry, my mail will be too long, but I want first to explain what I'm
currently doing and then open the discussion to explain what I would
like to see in MCJIT :slight_smile: Of course, I can help to develop something or
to give feed back, I'm hardly working on vmkit currently and I have
time to spend on that (but I'm far from an expert in compilation
stuffs!).

Basically, I want to compile lazily my functions (we have a Java
virtual machine built upon vmkit and compiling all the rt.jar during
the bootstrap is not very realistic:)). So, by lazy compilation, I
mean that I want to compile (and even resolve) a function only when it
is used. For example, during the compilation of
void f() { g() }
I don't want to compile g(). I will only compile g() when it will be called.

For the moment, I use a home-made stub (I have attached the asm code,
it can give you some ideas if you plan to integrate a stub generator
able to perform dynamic dispatch like virtual call in c++) because
MCJIT does not provide this facility. So, for each function, I define
a module. In each module, I have to define the runtime function (such
as gcmalloc, throwException and this kind of functions). They are
defined in a separated runtime module populated during the bootstrap.
I have thus this picture

MCJIT
  >----------------------------------------------------------
  > > >
Runtime module module for f module for g

When I see that I need g during the compilation of f, I define an
external symbol g in f's module and I add a global mapping between g
and its stub in MCJIT. Everything works perfectly in the old JIT, so
the code is correct in this case. The problem that I face with
multiple modules in the old JIT is that the symbol g defined in f's
module and the symbol g defined in g's module are not the same, I thus
have to define multiple symbol in the old JIT for the same entity,
which is far from perfect. Anyway, it works.

With MCJIT, I can call f from my C++ code (relatively easy as f is
defined in its own module), the stub of g is called and I can
generate its code (which use other functions). But I can not find a
way to compile g and to update the mapping between g and the new
function pointer. When I use recompileAndRelinkFunction, I see that it
is not implemented in MCJIT, and when I use getFunctionPointer, I
obtain.... a null pointer? I have not investigated further, but
probably, having two symbols g in the same MCJIT does not work. And I
don't see what I can do at this step? Maybe that I have missed
something?

Otherwise, I already predict that I will also have one big problem
latter: I would like to inline functions from the runtime module in f
or g, and I would like to inline the code of an already compiled
function h in g. So, I would like to inline functions that comes from
different modules. It means that I would like to see MCJIT working
like an llvm::Linker, able to resolve the h symbol during the
compilation of g. And for the moment, as MCJIT can not see that the g
defined in f's module and the g defined in g's module represent the
same function, I think that I will have a problem latter.

So, for the moment, I'm a little bit stuck with MCJIT. Something that
could be really useful could be a mcjit that acts as a linker. If
MCJIT could have a map like this (I give a pseudo-c++ code)

class FnDescriptor {
  StringRef name;
  FunctionType fnType;
  LinkageType linkage;
};

class FnState {
  llvm::Module* definedModule;
/* maybe a llvm::ObjectImage* ? */
  List<RelocationTable*>* users,
  void* currentPointer
};

map<FnDescriptor, FnState>

it could be really useful. Let's imagine the same scenario with f that
calls g while g is not yet compiled. At the beginning of this
scenario, "g" "void ()" could simply be associated to a FnState with

<null, List<>, stubForG>.

After f's compilation, it could be something like that

<null, List<RelocationTable-of-module-f>, stubForG>.

And after the compilation of g, something like

<moduleOfG, List<Reloc-f, Reloc-g>, compiledCode-of-g>

with the relocation entries updated?

Otherwise, for safepoints, and for exception tables, it could be also
really useful to install call backs to let VMKit manages them itself
(but it's maybe provided by the safepoint/patchpoint patch?)? (with
something that can make the association between a MCSymbol and it's
actual address of course :slight_smile: )

See you!
Gaël

PS: by the way, Yaron, we currently face almost the same problems

Gael, anyone who dynamically un/loads functions will face these problems.
You have certainly gone much further than me trying to solve them!

Yaron

Hi Gaël,

Thank you for the detailed explanation. It's very helpful.

All of the things you describe could be done within MCJIT, but I'm not sure that's where they belong. We had a discussion about lazy function compilation at the LLVM Developers Meeting last week and the consensus among those present was that it would be better to leave this sort of lazy compilation to the MCJIT client rather than having MCJIT try to solve the problem because each client knows more about how it should be implemented for their particular case than MCJIT can possibly know. We could, perhaps, provide a reference implementation (outside of MCJIT) but it would likely be a very simple solution not well-suited for use in real-world programs.

The approach you describe with home-made stubs seems good. Now we just need to figure out why it isn't working!

To begin with, there problem is an issue with naming. In order to get the home-made stub solution to work, you'll need to distinguish between the name of the function as it appears at the call site (which will result in a call to your stub function) and the name of the function as it is defined in the implementation module. For instance, you'll probably want the module for 'f' to look something like this:

Hi Yaron,

I think a lot of what I said in my reply to Gaël also applies to your situation. In particular, I think that it’s probably best for your code to manager the function stubs and replacement. I talked last week with a developer who works on the Julia language (which shares a lot of features your situation) and it’s my understanding that the Julia runtime handles function stubs and function replacement in a way that is similar to what I described even though they are currently using the old JIT engine (not coincidental since my discussions with him helped shape my ideas about how to do this).

Module deletion is clearly a bit of a problem currently. This is on my wish list of things for MCJIT to support. One of the biggest barriers to module deletion is that MCJIT doesn’t track links between generated objects, so if you want to delete an object that is being called from another generated object that would be a problem. Using client-managed stubs for inter-module linking obviously helps with this problem.

The other issue is that we aren’t currently telling the memory manager which module any given allocation request is associated with. There are some clues that a sufficiently motivated memory manager could possibly use to figure it out, but there’s nothing to directly support it.

I think the module deletion case is worth discussing further. I’d be happy to hear proposals for changes to support it.

Registration of EH frame information as new modules are added should work in MCJIT. If it doesn’t that’s a bug.

-Andy

Hi Andy,

Thanks for the answer. I'm currently reading the internal code of
MCJIT and it's really a great work (I was only using the
ExecutionEngine interface for the moment). So, I agree, all what I
need is already in the code (see below) :slight_smile:

Hi Gaël,

Thank you for the detailed explanation. It's very helpful.

All of the things you describe could be done within MCJIT, but I'm not sure that's where they belong. We had a discussion about lazy function compilation at the LLVM Developers Meeting last week and the consensus among those present was that it would be better to leave this sort of lazy compilation to the MCJIT client rather than having MCJIT try to solve the problem because each client knows more about how it should be implemented for their particular case than MCJIT can possibly know. We could, perhaps, provide a reference implementation (outside of MCJIT) but it would likely be a very simple solution not well-suited for use in real-world programs.

I understand the point. Probably that providing a small example that
describes how using advanced features of MCJIT could help. If I can
manage to make MCJIT works with VMKit, I'll be happy to send you an
example of lazy compilation that highlight some of the features of
MCJIT.

The approach you describe with home-made stubs seems good. Now we just need to figure out why it isn't working!

To begin with, there problem is an issue with naming. In order to get the home-made stub solution to work, you'll need to distinguish between the name of the function as it appears at the call site (which will result in a call to your stub function) and the name of the function as it is defined in the implementation module.

I don't know if I understand, you say that having two different names
is required, or simply that it's more easy to manage? Because I can
have to generate a lot of calls to g before compiling it and it means
that I will have to generate a lot of different names (not a big deal,
but it takes time during the execution)...

For instance, you'll probably want the module for 'f' to look something like this:

----

declare i32 @g_stub()

define i32 @f() {
  %r = call i32 @g_stub( )
  ret i32 %r
}

----

and the module for 'g' will look something like this:

----

define i32 @g() {
  ret i32 0
}

----

Now when you generate code for the 'f' module, MCJIT will call the memory manager asking for the address of 'g_stub' and you'll give it a pointer to your stub function. When your stub function gets called, you'll call MCJIT::getFunctionAddress('g') and MCJIT will generate code for the 'g' module and return the address of the real 'g' function.

You'll need to keep the stub around as a pass-through because the address of the stub is now baked into the code that was generated for 'f',

But the location of the call-site (the pointer to g_stub in the
generated code) should be in the relocation table of the module f? You
don't think that a client could reuse this relocation table directly?
(as we can do to dynamically modify a function pointer in a shared
library)

but that's a good thing is you might eventually want to replace the function because it gives you a single point to redirect calls to 'g' from any module that is calling it through 'g_stub'.

For performance, it's not perfect (two calls to reach a function), but
I agree, it simplifies the management.

If it's really important to you to be able to modify the 'f' call site to go directly to 'g' once its generated, you may be able to do that with the patchpoint stuff, but I don't know the details.

Yes, I'm pretty sure that with patchpoint, I can handle this problem.

Anyway, except for the stub naming, that sounds an awful lot like what you described. If using the names as I describe doesn't fix things for you, we may have a bug to fix.

cool :slight_smile: I will try this week-end and I will tell you if it works.

The inlining problem is perhaps a bit trickier. What I would suggest in that case is that you basically need to link your modules against the library module that you want to use for inlining before you pass the modules to MCJIT. There's an Intel product that does this, so I know it can be made to work. I'm not sure there's a simple interface for it. Basically what you need to do is extract the functions from your runtime library into the module you want to optimize and then run it through whatever optimization passes you're interested in.
I'm sure there are some helper classes that could be implemented to make this easier, but the details are ultimately implementation specific. This is a problem with the old JIT too though, right?

Inlining the code of an already compiled module is an even tougher problem. That's really something that would need to be addressed at the code generation level, I think. If you had the generated code in a sufficiently general form you could probably do this with an intrinsic (assuming you definitely knew you wanted to inline it). Generally I think it's outside the design space that MCJIT intends to target.

Yes, I also think so, it's a problem of code generation. I will try to
adapt the current SimpleInline pass of llvm and use the internal state
of VMKit/J3 to retrieve the IR of an already generated function. As I
have to keep a map that associates names to llvm::Function in VMKit, I
just have to use it to find the code.

So, to summarize, I think that the current implementation of MCJIT is sufficient to address your lazy compilation needs, though I'd be happy to continue the discussion if you think something more is needed.

Yes, after your answer and after having taken a look inside the MCJIT
code, I'm pretty sure that I can use MCJIT for lazy compilation.

That leaves me with your "safepoint" issue, which I don't have a clear picture of.

I'm not at this point for the moment :slight_smile: As soon as I'm able to make
VMKit runs with MCJIT (without the garbage collector), I will try to
find exactly what is missing for the GC (and I will also explore the
safepoint/patchpoint patch to see if I can use it in VMKit for the GC,
but it's not related to this topic).

Thank you for everything, I'll tell you soon if it works,

Gaël

Hi Gaël,

I'm glad to hear that MCJIT looks promising to you.

I understand the point. Probably that providing a small example that describes how using
advanced features of MCJIT could help. If I can manage to make MCJIT works with VMKit,
I'll be happy to send you an example of lazy compilation that highlight some of the features
of MCJIT.

I'd love to have a reference implementation of lazy compilation to be able to refer to when this topic comes up. It would also help new users get something up and running until they had time to write a more elaborate implementation tailored to their particular needs. If you'd be willing to share when you get something to this point, that would be outstanding.

I don't know if I understand, you say that having two different names is required, or simply
that it's more easy to manage? Because I can have to generate a lot of calls to g before compiling
it and it means that I will have to generate a lot of different names (not a big deal, but it takes
time during the execution)...

Sorry. What I meant was that there should be one name for any and all calls to 'g' and another name for the implementation. The idea is that all callers of 'g' will get linked to the stub and only the stub will call 'g' directly. If the calls and the implementation use the same name MCJIT will start trying to link callers directly to the implementation once it's available.

But the location of the call-site (the pointer to g_stub in the generated code) should be in the
relocation table of the module f? You don't think that a client could reuse this relocation table directly?

There are a couple of problems with that.

First, MCJIT gets the relocations in a slightly different form than a pure loader typically would. The code that MCJIT is dealing with generally has a relocation for each call site. Depending on the relocation model that was specified when the code was generated this may be something that would normally resolve to a jump into a procedure linkage table. However, because MCJIT (actually RuntimeDyld at this point) is performing the role of both linker and loader, it dispenses with the PLT when possible and just generates a direct call to the function. (Since it has to patch each call site anyway, it may as well patch it to the real function location.) In some circumstances the target of the relocation is a PC-relative jump and a direct call isn't possible. In those cases RuntimeDyld generates a stub, but still doesn't implement the full PLT semantics. This part could be changed if there were a compelling reason to do so.

The second issue is that MCJIT (by way of RuntimeDyld again) throws away all of its relocation information once an object image has been prepared for execution. Right now, it still has the original ELF image that was generated with all its headers, and so it would be possible to reparse the relocation information and recreate the data structures that are used to apply relocations in the first instance, but I'm pretty sure that's overkill. Also, I'd like to see MCJIT discard the original image when it can.

If MCJIT implemented full PLT support and maintained a permanent mapping of symbol names to PLT/GOT entries, it might be possible to do something like you've suggested. However, I think it's probably better to defer the details to the client (by way of the stub mechanism outlined previously) in the case where function replacement is needed.

-Andy

Hi Andy,

I have been following Julia with interest, as it’s a type-optional language designed to be as nice as dynamic languages but run at the speed of compiled C++ programs. They achieve this by deducing as many types at compile time and JITing code for the unknown types at runtime when they are known. That’s a smart use of a JIT. In C++ terms it’s analog to instantiating templates at runtime.

Julia has programmer-controllable dynamic dispatching of functions according to the best “match”. I guess that is the place where the stub function management happen. All together it’s very nice and smart design.

The nice thing about the legacy JIT is that it just works with modified code. All you need is to call freeMachineCodeForFunction and the JIT will automatically re-compile and update the stub. It’s very easy to use.

The JIT stub functions are not real functions but just jumps which the JIT keeps track of and updates as needed. This implementation does not have the problem of different names for the stub and real functions as the stub functions do not exists in the IR. From the programmer perspective it’s a technical detail the JIT takes care of.

Stubs in IR or MC are not trivial to implement so it would be nice to continue providing clients this functionality after the JIT is gone. It’s not required that to have the functionality in the MCJIT itself. A “FunctionJIT” could provides Function-level services and stub functions to a client, using MCJIT as its engine.

Regarding module removal, MCJIT or the linker needs to keep maps at least for the object, EH data, debug info so they could be removed (EH data is currently removed when the JIT is destroyed). In addition, the issues you mentioned need to be cared of. Symbols need to be removed from the linker. There are lots of details.

Yaron

Hi Yaron,

To be clear, I’m not suggesting that stubs be implemented in IR. I’m just suggesting that clients should use a different name at the call site for functions that they want to stub out so that when MCJIT asks the memory manager for a pointer to the function the client can easily identify it as something that should be handled with a stub (and so that MCJIT won’t try to link calls directly to the implementation when it becomes available).

I agree that a reference implementation of this would be useful, at least as a guide to demonstrate the way it is intended to work. I just expect that most clients that need this sort of functionality will want to do something other than provide simple stubs for all functions.

Regarding module removal, tracking all the information necessary to make that possible is somewhat at odds with the goal of providing a small memory footprint to make MCJIT useful on memory-constrained devices. I’m not saying that means we shouldn’t do it – just that it needs to be something that can be switched off in a way that doesn’t incur the memory overhead. I’d also like to look for a way to handle it entirely within the memory manager.

BTW, I heard from Keno Fischer that he has an experimental implementation of Julia using MCJIT working. Obviously there are some issues to be worked through before it’s something that can be rolled into the main code base, but he went from a JIT-based implementation to an MCJIT-based implementation that passes their basic test suite in about a week. That’s a pretty encouraging result, I think.

-Andy

Hi Andrew (hi all:)),

I perfectly understand the problem of relocation and it's really not a
problem in my case. I'm still trying to make MCJIT runs but I face a
small problem. I have to insert callback to the runtime for functions
provided by vmkit (for example, a gcmalloc function to allocate memory
from the heap). With the old JIT, VMKit simply loads a large bc file
that contains all the needed functions at runtime and, as they are
also compiled the VMKit binary binary, we manually associate the llvm
function names to their physical address through an addGlobalMapping
with a dlsym.

If I understand correctly, MCJIt does not like this solution because
it tries to find an ObjectCache that contains the symbol? And I don't
see how I can perform something similar with an ObjectCache? Of
course, I would like to avoid a recompilation of the runtime functions
during the bootstrap because, to simplify the process of generating
the bitcode of the runtime functions, I simply reload all the bitcode
of vmkit at bootstrap in a module. Do you think that I could directly
map a shared library or something like that to an ObjectCache object?

Thank you in advance!
Gaël

Hump, I think that I have to solution, but I have a new problem (more
serious). For the solution, it's stupid, I have simply loaded the
shared library in ObjectCache::getObject directly in a MemoryBuffer :slight_smile:
As the linker understand a .o, it understands a .so.

Now, I'm able to compile a module (I call finalizeObject()), I'm able
to find my first generated function pointer, but I'm unable to execute
it. The code is correct (the same as with the old jit):

-> 0x107ef8000: pushq %rax
   0x107ef8001: movabsq$0, %rax
   0x107ef800b: callq *%rax
   0x107ef800d: popq %rax
   0x107ef800e: ret

But, first, the memory of the code is not in exec mode, and if I
mprotect the code with the PROT_EXEC flag, the instruction "movabsq$0,
%rax" fails with a
"error: address doesn't contain a section that points to a section in
a object file"

So, I'm pretty sure that I have forgotten something, but I don't see what?

If you could help, it would be perfect :slight_smile: Basically, what I have done:
* I have an execution engine and two modules associated with it. The
first one contains the bitcode of vmkit and the second one my great
function with four lines of code.
* The execution engine is associated with an ObjectCache, which return
a MemoryBuffer that contains my shared library for the runtime module,
and for the other module, it answers null.

To compile, I call finalizeObject() after having loaded the runtime
module, and finalizeObject() after having populated the first module.
FInally, I call getPointerToFunction() to have the code and call it
manually (i.e., with a cast).

Thank you in advance!
Gaël

Hi Gaël,

I would guess that MCJIT is probably attempting to load and link the shared library you return from the ObjectCache in the way it would load and link generated code, which would be wrong for a shared library. I know it seems like it should be easier to handle a shared library than a raw relocatable object (and it probably is) but MCJIT doesn't handle that case at the moment. The ObjectCache interface expects to get a relocatable object image in exactly the form that MCJIT would see if it had gone through the usual compilation phase.

I could be wrong about this being the problem. I'm just guessing as to what MCJIT would do if you pass it a shared library, but my guess is that it wouldn't handle it correctly. I've got a patch in flight right now that adds support for being able to add object files and archive files to MCJIT without reference to the ObjectCache interface, but this doesn't support shared libraries either.

However, if you have the code you want to call in a shared library, I'm not sure you need MCJIT to do anything at all. If you load the shared library from your program (using the usual dlopen mechanism) then the default linking mechanism (which goes through sys::DynamicLibrary::SearchForAddressOfSymbol and therefore through dlsym on Linux) should find the exported functions.

-Andy

Hi Andrew,

Thank you very much for all your help! So, I have tested without my
shared library (with a relocatable object and without) and still, my
code is not executable. I was testing my code with multiple modules
and I don't now if using multiple modules is fully functional?

Anyway, I'm now allocating a mcjit for each function to be sure. But
now, I have a new problem that comes before: in my module, I'm using a
stub:

define void @j3_java_lang_Object_clinit() gc "vmkit" {
entry:
  call void @j3_java_lang_Object_registerNatives_stub()
  ret void
}

declare void @j3_java_lang_Object_registerNatives_stub()

And I manually associates a function pointer to my stub with
addGlobalMapping. But during compilation, mcjit tells me that it is
unable to find this function?

LLVM ERROR: Program used external function
'_j3_java_lang_Object_registerNatives_stub' which could not be
resolved!

If I can not manually register stubs, I don't see how I could call
them? If my program should work, it's maybe a bug in my code? Maybe
that I have to implement my own Dyld to retrieve the external symbol?

Thank you again :slight_smile:
Gaël

Hi Gaël,

Multiple module support should be fully functional. However, there are some oddities in how MCJIT gets memory ready to execute, particularly if you are using the deprecated getPointerToFunction or runFunction methods. If you use these methods you'll need to call finalizeObject before you execute the code. I've heard reports that there's a bug doing that after adding multiple modules, so if you're having trouble I would recommend using getFunctionAddress instead.

Regarding your linking issue, it happens that addGlobalMapping is currently broken for MCJIT. I'm not sure when I'll have time to fix it, though it would be a pretty easy patch if you'd be willing to do it yourself.

I don't especially like the addGlobalMapping interface, however, and I wouldn't be against it being removed. What I would suggest as an alternative is for you to implement a custom memory manager (you can derive from SectionMemoryManager to get the basic functionality). When MCJIT encounters an external symbol that it doesn't know how to resolve, it calls the memory manager's getSymbolAddress method. This is the right way to get your stubs hooked up.

-Andy

Yipa! It works very easily with my customized SectionMemoryManager :slight_smile:
My code does not work so well because I still have issues to find
symbols, but it comes from my side. For the moment, I'm able to lazily
link three Java functions, that's great!

Thank you very much for all your help. I will finish the integration
in my vmkit version and then send you a (first) small example of code,
which could probably help other developers when they begin to use
MCJIT (complementary to the Kaleidoscope tutorial).

Otherwise, I don't know if keeping addGlobalMapping is so important
with MCJIT: a virtual machine already have to manage its own symbol
table. Replicating this data inside MCJIT becomes useless with a
SectionMemoryManager. I prefer your design :slight_smile:

Congratulation for your work on MCJIT, the (apparent) simplicity of
the design is really impressive,

Gaël