ORC JIT Weekly #6 -- General initializer support and JITLink optimizations

Hi All,

The general initializer support patch has landed (see 85fb997659b plus follow up fixes).

Some quick background:

Until now ORC, like MCJIT, has handled static initializer discovery by searching for llvm.global_ctors and llvm.global_dtors arrays in the IR added to the JIT. This approach suffers from several drawbacks:

  1. It provides no built-in support for other program representations: Object files and custom program representations added to the JIT require manual intervention from the user to run their initializers.
  2. It requires naming and promoting the linkage of initializer functions, since they have to be looked up by name to be run.
  3. It doesn’t handle platform-specific initializers, e.g. Objective-C registration, which are described by globals in specific sections.

The general initializer support patch has changed how initialization is handled. Now all MaterializationUnits, regardless of what kind of program representation they wrap (IR, object files, ASTs, etc.) can now declare an optional “initializer symbol”. Instances of the new Platform class (see include/llvm/ExecutionEngine/Orc/Core.h) are notified whenever MaterializationUnits are added to a JITDylib, and can record the presence of any declared initializers. By issuing lookups for initializers, the Platform can force their materialization and arrange for them to be run in a platform specific way (See https://reviews.llvm.org/D74300 for more discussion on this).

This new system is flexible enough to permit two very different platform implementations for LLJIT that are already available in tree: GenericLLVMIRPlatform and MachOPlatform. The former essentially re-implements the existing llvm.global_ctor scanning scheme: It promotes functions that appear in the llvm.global_ctors array, then looks them up by name and executes them when requested. On the other hand, MachOPlatform implements a scheme that mimics the behavior of the Darwin dynamic loader, dyld: By installing an ObjectLinkingLayer::Plugin, the MachOPlatform can scan all objects as they are materialized to discover known special sections (E.g. __mod_init_func, __objc_classlist, and __objc_selref), then handle them according the usual platform rules (__mod_init_func pointers are executed, Objective-C classes and selectors are registered with the Objective-C runtime).

While this system is still very new, it is far enough along that the lli command line tool, when run with the -jit-mode=orc-lazy option, can now execute simple IR compiled from simple Objective C and Swift programs on Darwin.

Also of interest this week: JITLink has a new “GOT and Stub bypass” optimization for x86-64. When linking position independent code, JITLink must conservatively build global offset table entries and stubs to access/call external symbols that may be out of range of the JIT’d code. With this new optimization, these indirect accesses may be bypassed if the JIT’d code ends up being allocated within range of the target. Coupled with a slab allocator for your JIT this optimization can eliminate a layer of indirection and may improve performance for some use cases. See 27a79b72162.

Just a heads up: I expect next week to be a quiet one, as I’m out on vacation from Wednesday.

– Lang.

Hello Lang,

This sounds interesting even tough I’m not able to understand everything.

I wonder, will your changes be in the upcoming LLVM10 release? Also, do your changes allow to get the actual addresses of the constructor and destructors - or are they fully managed by the ORC? I hope this makes sense!

Kind greetings and wishing you a nice vacation

Björn

Hi Lang,

I really like the direction that this this taking.

Our own ORC JIT workflow requires us to be able to JIT execute code found in llvm::ObjectFile’s without having access to the LLVM IR. (In summary, we jit compile a whole bunch of LLVM modules and only keep the object files around). The ability to invoke the global constructor and destructor functions contained within those object files is paramount to us as some of our object files might embed arbitrary fragments of LLVM IR generated by the clang backend.

Our internal solution is similar to the MachOPlatform one described below. We also have a similar solution for ELF object files (scanning the .init_array and .fini_array object sections) that we are using on both Linux and Windows x64 platforms. To be complete, the Windows 64 C++ ABI also requires us to intercept calls to atexit() to ensure the that destructor of global C++ objects are correctly invoked.

If there’s any interest, we would be happy to contribute our code for this.

Cheers,
Benoit

I wonder, will your changes be in the upcoming LLVM10 release?

Hi Benoit,

I really like the direction that this this taking.

Thanks!

Our internal solution is similar to the MachOPlatform one described below. We also have a similar solution for ELF object files (scanning the .init_array and .fini_array object sections) that we are using on both Linux and Windows x64 platforms. To be complete, the Windows 64 C++ ABI also requires us to intercept calls to atexit() to ensure the that destructor of global C++ objects are correctly invoked.

The GenericIR platform and MachO platform are already interposing __cxa_atexit to ensure that static destructors are run when the JITDylib is closed (See code scattered through lib/ExecutionEngine/Orc/LLJIT.cpp). We should include interposition of regular atexit calls too. I’m happy to review patches for that, or you can file a bug and assign it to me and I will try to get to it next week.

Ideally I would like to get the in-tree support to a state where you could rely on it, rather than having to maintain a custom implementation.

If there’s any interest, we would be happy to contribute our code for this.

I am very interested, and I think many other members of the community would be too. :slight_smile:

Please let me know how I can help, and assign any relevant reviews to me. As noted I’ll be away later this week, but I’ll be back Monday of next week and happy to answer questions.

Regards,
Lang.

Hey Lang,

Thank you for the explanation! I will look into those, seems like I could learn a lot from them.

Kind greetings

Björn