ORC JIT Weekly #1

Hi All,

In the interests of improving visibility into ORC JIT development I’m going to try writing weekly status updates for the community. I hope they will provide insight into the design and state of development of LLVM’s JIT APIs, as well as serving as a convenient space for discussions among LLVM’s large and growing community of JIT API users. The
length and detail will vary (depending on how much work I’ve gotten through, and how long I can dedicate to writing the update), but if people find the updates valuable I will make an effort to write at least something. Questions and comments are most welcome (both on the format, and on the content).

Since this is the first update, I have also added some highlights from last year, and the plan for 2020.

Highlights from 2019:

(1) ORCv1 was officially deprecated in LLVM 9. I have left it in for the LLVM 10 branch, but plan to remove it from master in the coming weeks. All development effort is now focused on ORCv2. If you are an ORCv1 client, now’s the time to switch over. If you need help please ask on the llvm-dev mailing lists (make sure you CC me) or #llvm on discord. There are also some tips available in https://llvm.org/docs/ORCv2.html .

(2) LLVM has a new JIT linker, JITLink, which is intended as an eventual replacement for RuntimeDyld. The new design supports linker plugins (allowing operation on the low-level bits generated by the JIT linker) and native code models (RuntimeDyld required a custom code model on some platforms). Currently JITLink only supports Darwin x86-64 and arm64, but I hope to see support for new platforms added in the future.

(3) Google Summer of Code student Praveen Velliengiri demonstrated a basic speculative compilation system built on ORCv2. This system analyses code added to the JIT and triggers early compilation on background threads for code that is likely to be used at runtime. Using this system Praveen was able to demonstrate significant speedups on JIT execution of some SPEC benchmarks. He presented this work at the 2019 LLVM Developer’s Meeting in San Jose (see https://preejackie.github.io/GSoC-2019-LLVM).

The plan for 2020:

  • Improve JIT support for static initializers:

  • Add support for running initializers from object files, which will enable loading and caching of objects containing initializers.

  • Improve support for platform-specific initializer kinds like Objective-C +load methods.

  • Switch from a push (“runConstructors”) to a pull model (“getConstructorsToRun”) for initializer execution. This will allow JIT’d code to “dlopen” other JIT’d code and run the initializers on the expected thread, which is important for JIT’ing code that uses threads and locks in initializers.

  • Improve adherence to static/dynamic linker rules: Weak symbol resolution across JITDylib boundaries is still not handled correctly.

  • Remove ORCv1.

  • Bug fixes and documentation improvements.

Status report for this week:

– I’ve been working on a generic implementation of the new initialization APIs which I hope to be able to land next week. This will replace the runConstructors/runDestructors API in LLJIT (providing equivalent functionality: initializers will be discovered by scanning llvm.global_ctors/llvm.global_dtors), and will enable the development of platform specific initializer-support implementations.

– There’s a long-ish chat with @drmeister on the discord #llvm channel about RuntimeDyld vs JITLink, and large vs small code model.

– I have added a new LLJIT example program that demonstrates how to use lazy-reexports in ORCv2 to add lazy compilation: llvm/examples/LLJITExamples/LLJITWithLazyReexports.

– COFF support in the JIT still lags ELF and MachO (See e.g. http://llvm.org/PR40074). If there are any COFF experts out there who are interested in helping out with JIT bugs please let me know!

Ok – that’s enough from me for now. If you’re a JIT user, developer, or just casual JIT-development observer (dblaikie), and you have questions, comments, or just feel like introducing yourself: jump on in. :slight_smile:

Lang.

Excellent initiative. Thanks, Lang! We are grateful for any high-level JIT-related info you see fit to provide, as it avoids us having to bother people on the mailing list or try to figure out how the source code works…

Geoff

Hi Lang,

Great idea on the status updates, I’ll be sure to check them out. Also thank you (and other contributors) for your work on ORC so far, it has been a fun API to tinker with and a great entry point into working with LLVM.

I’m for the most part just a lowly user of ORC but I try to contribute here and there when I can. Mostly by nagging about COFF support through bug reports ;-).

My use case for ORC is an expression-evaluator library I’m working on called JitCat (). It has some built-in reflection features to easily expose C++ functions/variables/classes for use in expressions. LLVM/ORC is used for code generation. Future plans are to extend JitCat into a fully featured scripting language. My own background is in game development, which is also what I use JitCat for myself.

Regards,

Machiel van Hooren
(jcmac on Discord)

Thank you for creating weekly updates, they will be quite useful, since previously ORC development seemed rather opaque.

One thing that will be useful (and was done to some extent with ORCv1) is to expose ORCv2’s API via C-compatible bindings such that code from languages other than C++ (Rust for me) can effectively use it, including things such as the equivalent of -march=native and introspection such that the supported SIMD widths can be detected. I’m planning on using ORC to compile shaders for Kazan, the GPU driver that I’m writing for libre-riscv’s hybrid cpu/gpu.

Jacob Lifshay

Hi, Lang
As a starter using LLVM JIT to improve OLAP execution engine performance, I’m very glad to hear that. I can’t find some useful document help me get start to use the new ORC JIT API quickly. Only can find some examples how to use it, but don’t know the internal from low level, and very blurred to design a clearly JIT toolset. Hope more tutorials add in and help ORC JIT more easy to adoption.

Big thanks.

Jacob Lifshay via llvm-dev <llvm-dev@lists.llvm.org> 于2020年1月17日周五 下午11:38写道:

Hi All,

Ok – sounds like there’s enough interest to keep going with these status updates. You can expect update #2 in a couple of days. :slight_smile:

Nice idea! - might even be worth spinning up a separate channel on the Discord for the JIT?

Seems reasonable to me. There has been a lot of JIT discussion in LLVM Project – it might be nice to move it to a jit channel to maximize the signal-to-noise ratio.

My use case for ORC is an expression-evaluator library I’m working on called JitCat (www.jitcat.org)…

Sounds very cool!

I’m for the most part just a lowly user of ORC but I try to contribute here and there when I can. Mostly by nagging about COFF support through bug reports ;-).

Anything you can contribute (bug reports included) is very welcome. Debugging and development for Windows is extra welcome, since I don’t have a windows box to develop or test with.

One thing that will be useful (and was done to some extent with ORCv1) is to expose ORCv2’s API via C-compatible bindings such that code from languages other than C++ (Rust for me) can effectively use it, including things such as the equivalent of -march=native and introspection such that the supported SIMD widths can be detected. I’m planning on using ORC to compile shaders for Kazan, the GPU driver that I’m writing for libre-riscv’s hybrid cpu/gpu.

That’s a really good point. And timely: We need an ORCv2 C API before we can kill off ORCv1. We should use http://llvm.org/PR31103 to track this (hopefully we can finally close it). If you’re interested in this work please CC yourself on that bug.

There are two approaches we can take to C bindings for ORCv2. The first one I’ll call “wrap LLJIT”, and it’s pretty much what it sounds like: We provide an API for initializing an LLJITBuilder, and accessing methods in the resulting LLJIT object. This would provide a similar level of functionality to the ExecutionEngine bindings, and also enable basic lazy compilation. The second approach would be to wrap the lower level APIs (ExecutionSession, MaterializationUnit, etc.) to allow clients to build their own JIT instances in C. These approaches aren’t mutually exclusive, and the best way forward is probably to start with the first approach, then add elements from the second over time.

Any volunteers to work on this? I need to finish the new initializer work before I can tackle this, so I might be a while yet.

As a starter using LLVM JIT to improve OLAP execution engine performance, I’m very glad to hear that. I can’t find some useful document help me get start to use the new ORC JIT API quickly. Only can find some examples how to use it, but don’t know the internal from low level, and very blurred to design a clearly JIT toolset. Hope more tutorials add in and help ORC JIT more easy to adoption.

Ok. Which tutorials have you been following? If possible, could you write some notes on where you got stuck, or where the design was difficult to follow? That will help us determine where the documentation and tutorials could most benefit from improvement.

– Lang.

Hi All,

Ok – sounds like there’s enough interest to keep going with these status updates. You can expect update #2 in a couple of days. :slight_smile:

Yay!

One thing that will be useful (and was done to some extent with ORCv1) is to expose ORCv2’s API via C-compatible bindings …

That’s a really good point. And timely: We need an ORCv2 C API before we can kill off ORCv1. We should use http://llvm.org/PR31103 to track this (hopefully we can finally close it). If you’re interested in this work please CC yourself on that bug.

Added myself.

There are two approaches we can take to C bindings for ORCv2. The first one I’ll call “wrap LLJIT”, and it’s pretty much what it sounds like: We provide an API for initializing an LLJITBuilder, and accessing methods in the resulting LLJIT object. This would provide a similar level of functionality to the ExecutionEngine bindings, and also enable basic lazy compilation. The second approach would be to wrap the lower level APIs (ExecutionSession, MaterializationUnit, etc.) to allow clients to build their own JIT instances in C. These approaches aren’t mutually exclusive, and the best way forward is probably to start with the first approach, then add elements from the second over time.

Any volunteers to work on this? I need to finish the new initializer work before I can tackle this, so I might be a while yet.

Sorry, I’m currently working on the frontend for our shader compiler, I won’t get to the parts that need LLVM for at least a month or so. However, once I’m there, I can help out some with creating the C bindings.

Jacob

I appreciate the weekly updates very much - thanks Lang!

Everything I know about the ORCV2 JIT I learned from Lang’s 2018 LLVM-Dev talk.
There is sample code scattered throughout the talk that I’ve watched over and over again to piece together our JIT.

There

Hi,

Didn’t see this one coming. This is great and very helpful to keep track of latest development in ORC!!

Thank You.

Hi,

In the interests of improving visibility into ORC JIT development I'm
going to try writing weekly status updates for the community. I hope
they will provide insight into the design and state of development of
LLVM's JIT APIs, as well as serving as a convenient space for
discussions among LLVM's large and growing community of JIT API users.

That's a great idea.

Since this is the first update, I have also added some highlights from last year, and the plan for 2020.

Highlights from 2019:

(1) ORCv1 was officially deprecated in LLVM 9. I have left it in for
the LLVM 10 branch, but plan to remove it from master in the coming
weeks. All development effort is now focused on ORCv2. If you are an
ORCv1 client, now's the time to switch over. If you need help please
ask on the llvm-dev mailing lists (make sure you CC me) or LLVM Project on
discord. There are also some tips available in
ORC Design and Implementation — LLVM 18.0.0git documentation

I also want to highlight the necessity of some form of C API, that
others already have.

Besides just needing something that can be called from languages besides
C++, some amount of higher API stability is also important. For users of
LLVM with longer support cycles than LLVM (e.g. Postgres has 5 years of
back branch maintenance), and which live in a world where vendoring is
not allowed (most things going into linux distros), the API churn can be
serious problem. It's fine if the set of "somewhat stable" C APIs
doesn't provide all the possible features, though.

It's easy enough to add a bunch of wrappers or ifdefs hiding some simple
signature changes, e.g. LLVMOrcGetSymbolAddress adding a parameter as
happened in LLVM 6, but backpatching support for a larger API redesigns,
into stable versions, is scary. We do however quickly get complaints if
a supported version cannot be compiled due to dependencies, as people
tend to upgrade their OS separately from e.g. their database major
version.

(2) LLVM has a new JIT linker, JITLink, which is intended as an
eventual replacement for RuntimeDyld. The new design supports linker
plugins (allowing operation on the low-level bits generated by the JIT
linker) and native code models (RuntimeDyld required a custom code
model on some platforms). Currently JITLink only supports Darwin
x86-64 and arm64, but I hope to see support for new platforms added in
the future.

What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1?
Are there features supported in v1 that are only available on JITLink
supported platforms?

- Improve JIT support for static initializers:
  - Add support for running initializers from object files, which will enable loading and caching of objects containing initializers.

Hm, that's kind of supported for v1, right?

Greetings,

Andres Freund

Hi Andres,

I also want to highlight the necessity of some form of C API, that others already have.

It’s fine if the set of “somewhat stable” C APIs doesn’t provide all the possible features, though.

Ok. This got me thinking about what a simple LLJIT API should look like. I have posted a sketch of a possible API on http://llvm.org/PR31103 . I don’t have time to implement it just yet, but I would be very happy to provide support and review patches if

anyone else wants to give it a shot.

What’s the capability level of ORCv2 on RuntimeDyld compared to ORCv1?
Are there features supported in v1 that are only available on JITLink
supported platforms?

At a high level, ORCv2’s design allows for basically the same features as ORCv1, plus concurrent compilation. There are still a number of APIs that haven’t been hooked up or implemented though. Most prominently: Event listeners and removable code. If you’re using either of those features please let me know: I do want to make sure we continue to support them (or provide an equivalent).

There are no features supported by ORCv1 that require JITLink under ORCv2.

  • Improve JIT support for static initializers:
  • Add support for running initializers from object files, which will enable loading and caching of objects containing initializers.
    Hm, that’s kind of supported for v1, right?

It’s “kind of” supported. MCJIT and ORCv1 provided support for scanning the llvm.global_ctors variable to find the names of static initializers to run. This works fine when (1) you’re adding LLVM IR AND (2) you only care initializers described by llvm.global_ctors. On the other hand, if you add object files (or loading them from an ObjectCache), or if you have initializers not described by llvm.global_ctors (e.g. ObjC and Swift, which have additional initializers described by metadata sections) then MCJIT and ORCv1 provide no help out-of-the-box. This problem is further exacerbated by concurrent compilation in ORCv2: You may need to order your initializers (e.g. according to the llvm.global_ctors priority field), but objects may arrive at the JIT linker out of order due to concurrent compilation.

The new ORCv2 initializer support aims to make all of this natural: We will provide ‘dlopen’ and ‘dlclose’ equivalent calls on JITDylibs. This will trigger compilation and execution of initializers that have not been run already. If you use JITLink, this will include using JITLink-plugins to discover the initializers to run, including initializers in metadata sections.

– Lang.

Hi Lang,

I also want to highlight the necessity of some form of C API, that others
> already have.
>
<snip>

> It's fine if the set of "somewhat stable" C APIs doesn't provide all the
> possible features, though.

Ok. This got me thinking about what a simple LLJIT API should look like. I
have posted a sketch of a possible API on Design an ORC C-API / ExecutionEngine replacement · Issue #30451 · llvm/llvm-project · GitHub .

I'll take a look.

I don't have time to implement it just yet, but I would be very happy
to provide support and review patches if anyone else wants to give it
a shot.

Hm. I don't immediately have time myself, but it's possible that I can
get some help. Otherwise I'll try to look into it once my current set of
tasks is done, if you haven't gotten to it by then.

> What's the capability level of ORCv2 on RuntimeDyld compared to ORCv1?
> Are there features supported in v1 that are only available on JITLink
> supported platforms?

At a high level, ORCv2's design allows for basically the same features as
ORCv1, plus concurrent compilation.

Cool.

There are still a number of APIs that
haven't been hooked up or implemented though. Most prominently: Event
listeners and removable code. If you're using either of those features
please let me know: I do want to make sure we continue to support them (or
provide an equivalent).

Heh, I/pg uses both :frowning:

WRT Event listeners: I don't quite know how one can really develop JITed
code without wiring up profiler and debugger. I'm not wedded to the
event listener interface itself, but debugger & profiler are really
critical. Or is there a different plan for those features?

WRT removable code:

Postgres emits the code for all function it knows to need for a query at
once (often that's all that are needed for one query, but not always),
and removes it once there are no references to that set of functions
anymore. As one session can use a *lot* of code over its lifetime, it's
not at all feasible to not unload. Right now we use
LLVMOrcRemoveModule(), which seems to work well enough. FWIW, for that
usecase there's never any references into the code that needs to be
removed (it only exports functions that need to be called by C code).

It doesn't look all that cheap to just create one LLJIT instance for
each set of code that needs to be removable. I don't really forsee using
LLVM side lazy/incremental JITing - so far my experiments showing that
the overhead of a code generation step makes it unattractive to incur
that multiple times, and we have an interpreter that we can use until
JIT compilation succeeds. So perhaps it's not *that* bad?

What is the biggest difficulty in making code removable?

In case you happen to be somewhere around the LLVM devroom at fosdem I'd
be happy to briefly chat in person...

Greetings,

Andres Freund

Hi Andres,

There are still a number of APIs that
haven’t been hooked up or implemented though. Most prominently: Event
listeners and removable code. If you’re using either of those features
please let me know: I do want to make sure we continue to support them (or
provide an equivalent).

Heh, I/pg uses both :frowning:
WRT Event listeners: I don’t quite know how one can really develop JITed
code without wiring up profiler and debugger. I’m not wedded to the
event listener interface itself, but debugger & profiler are really
critical. Or is there a different plan for those features?

We definitely need debugger and profiling support. The right interface for this is an open question.

I think we can add support for the existing EventListener interface to RTDyldObjectLinkingLayer. That will make porting easy for existing clients.

EventListener isn’t a good fit for JITLink/ObjectLinkingLayer at the moment. EventListener (via the RuntimeDyld::LoadedObjectInfo parameter to notifyObjectLoaded) is implicitly assuming that the linker operates on whole sections, but JITLink operates on a per-symbol basis, at least on MachO. Individual symbols within a section may be re-ordered or dead-stripped, so there’s no easy correspondence between the original bytes of a section and the final allocated bytes.

That said, I don’t think there’s any fundamental problem here: The static linkers perform dead stripping and reordering too. As long as we figure out the right way to present the layout of the allocated memory to the debuggers and profilers I think they should be able to handle it just fine. Better yet, we don’t have to come up with a new “EventListener 2.0” API for ObjectLinkingLayer: It already has ObjectLinkingLayer::Plugin, which (with some minor tweaks) should be much more flexible.

If you’re interested in trying out the ObjectLinkingLayer::Plugin API at all, there’s an example in llvm/examples/LLJITExamples/LLJITWithObjectLinkingLayerPlugin (code on GitHub here).

WRT removable code:

Right now we use LLVMOrcRemoveModule(), which seems to work well enough…

Good to hear.

It doesn’t look all that cheap to just create one LLJIT instance for
each set of code that needs to be removable.

I haven’t tested the cost yet, so couldn’t say either way. I definitely haven’t optimized construction of instances though.

I don’t really forsee using LLVM side lazy/incremental JITing - so far
my experiments showing that the overhead of a code generation step
makes it unattractive to incur that multiple times, and we have an
interpreter that we can use until JIT compilation succeeds. So perhaps
it’s not that bad?

Just to make sure I understand: Are you saying that the overhead of constructing the codegen pipeline shows up as substantial overhead? I can totally believe it, I’ve just never measured it myself.

If that’s the case it would be interesting to dig in to where the time is being spent. This has never been optimized in the JIT, so there may be some easy improvements we can make.

What is the biggest difficulty in making code removable?

Concurrency, mostly. If you’ve added a symbol definition, what happens if you issue a call to remove it just as someone else tries to look it up?

My answer (not yet implemented) is in several parts:

(1) On the JIT state side:

(1.a) If the symbol hasn’t been compiled yet then a call to remove may end up being high latency: E.g. in the case above if the lookup arrives first it will trigger a compile, and from the JIT’s perspective that’s one long, uninterruptible operation. That’s bad luck, but once it’s done the call to remove can prevent the compiled code from being registered with the symbol table, and can inform anyone who was waiting on that definition that it failed to compile. On the other hand, if the call to remove arrives first then the operation will be quick and it will be as-if the symbol were never defined.

(1.b) If the symbol has been compiled already: We free the allocated resources and remove it from the symbol table. This operation should be quick, but see part (2).

(2) On the JIT’d code side: Clients are responsible for resource dependencies. For example if you’ve JIT’d two functions, foo and bar, and foo contains a call to bar, and then you remove bar: it is up to you to make sure you never hit that call site for bar in foo.

(3) Overhead: Some clients want fine grained resource tracking, others don’t. My plan is to replace the VModuleKey placeholder type with a ResourceTracker class. If you specify a ResourceTracker when adding a module to the JIT then you will be able to call ResourceTracker::remove to remove just that module. If you do not specify a ResourceTracker when adding a module then the Module will be assigned to the default ResourceTracker for the containing JITDylib. Resources for the module will be deleted when you close the containing JITDylib.

As for why this hasn’t been implemented yet: just time constraints on my part.

– Lang.