What is on the LLVM horizon for truly relocatable JITted code?

Hello everyone,

Is fully relocatable/position-independent JITted code on the horizon or currently possible with LLVM?

I’ve written a Common Lisp compiler (currently called Clasp: https://github.com/drmeister/clasp) in C++ that uses LLVM as the backend and interoperates with C++. It uses copying garbage collection via the Memory Pool System (MPS) garbage collector by Ravenbrook. This garbage collector is precise on the heap and conservative on the stack.

Currently I JIT code to wherever LLVM drops the code and it remains fixed in memory. This causes some problems for implementing a dynamic language like Common Lisp because CL considers data and code to be equivalent.

I’d like to move the code into the MPS managed memory and be able to apply copying garbage collection to it. Is this possible? Will it ever be possible?

Best,

Christian Schafmeister
Professor
Chemistry Department
Temple University

+Lang, collector of strange JIT requirements

Hi Christian,

At the moment we support PIC in the sense that you can, through the RTDyldMemoryManager, map text/data sections wherever you like. Once they’ve been laid down however, the current JIT infrastructure treats them as fixed. It sounds like you want to go further than that and slide text/data around during garbage collection? You might be able to do that with the current infrastructure by holding on to the relocatable objects and re-emitting text sections to new locations as needed (the data sections you would have to move manually so as not to overwrite away any changes). Unfortunately this would double your memory overhead since you would have two copies of all your sections: The emitted version and the cached, relocatable version. With a bit of work you could teach the JIT linker (RuntimeDyld) to optionally hold on to the necessary relocations and so eliminate that overhead too.

So, I don’t think you can do this today, but with a bit of effort I think it could be made to work. It certainly sounds like an interesting idea.

Cheers,
Lang.

Lang,

Thank you!

That is a very interesting idea to cache a relocatable version of the code to move it wherever the GC wants it to go and then discard it once the GC determines that it is no longer referenced I’ll have to ponder this idea. I think I could implement that with what I have right now. I don’t expect code to move around too much - most will end up in the oldest generation and then stay put. I’ll have to talk to the folks at Ravenbrook (who wrote the MPS garbage collector) to see if I can relocate the code rather than let the MPS library move the code.

Thanks again.

Best,

.Chris.

I'm not sure what this actually buys you. There are a few reasons why you don't want to treat code compiled and data in the same way:

- You want code to be executable but not writeable

- Code doesn't typically support the infant mortality hypothesis (things in eval and so on can be special-cased in a short-lived allocation, the source cached, and recompiled if they persist more efficiently than trying to move code around)

- Code can't be scanned for roots in the same way as data as immediate pointer values may be materialised across multiple instructions (not important if your code only ever refers to globals via arguments).

- The set of reachable objects from a piece of compiled code never changes over its entire lifetime.

- You should never have to deal with interior pointers to code, except return addresses on the stack.

For this reason, most modern systems that do run-time code generation have one or more special regions for code, often with copies of the pointers stored alongside. There's nothing stopping you from relocating a closure object that refers to a function with your data GC and using the deallocation event to trigger the RTDyldMemoryManager to be allowed to recycle the memory. As the existing infrastructure permits you to relocate code to its final position after initial code generation, you can pick a suitable size from your free list.

David

Will you be able to find and patch return addresses on the stack?

Or, as the stack is only conservatively scanned, the presence of a possible
pointer into the code prevents it being moved?