ORC JIT Weekly #14 -- Removable code

Hi All,

A preliminary version of removable code support has been posted for review in https://reviews.llvm.org/D79312. This patch removes all uses of VModuleKeys (except for Legacy layers) and takes a whole-JITDylib-at-a-time approach to removal. Removing whole JITDylibs requires more work from clients (compared to per-module removal): Modules to be removed must be placed into throw-away JITDylibs and re-exports used to make symbol definitions visible at the intended locations. On the other hand restricting removal to whole JITDylibs can help to avoid subtle dependence bugs: existing object formats and linker rules are already designed to make independently loading and unloading libraries relatively safe, whereas there is no precedent for unloading individual modules from within a library at runtime.

As an example of how unloading individual modules can lead to subtle dependence bugs consider the following REPL for a simple language (Kaleidoscope from https://llvm.org/docs/tutorial/). In this example the REPL values will all be of floating point type, functions will be compiled lazily, and anonymous expressions will be removed immediately after running them. Under these assumptions, how does the REPL respond to the following input?

def foo(x) x * 2.0;
foo(2.0) + 1.0;
// output #1 here
// anonymous expression #1 removed here
foo(3.0);

// output #2 here
// anonymous expression #2 removed here

We expect the result to be:

output #1: 5.0

output #2: 6.0

And on Linux and Darwin it will be. On Windows however, the output is likely* to be:

output #1: 5.0
output #2: segfault

(*The exact output will depend on the target options used).

The problem is that when compiling to COFF (the Windows relocatable object file format) floating point constants may be stored in named COMDAT entries (See e.g. https://llvm.org/PR40074). Only one copy of a constant value is emitted, and other modules will be linked to refer to that copy. In our example, because of lazy compilation, the first copy of the constant 2.0 that the JIT linker will encounter is the one used in anonymous expression #1. The body of foo will link against this copy and be left with a dangling reference when anonymous expression #1 is deleted. Attempting to re-run the foo function in anonymous expression #2 will then crash.

The COFF format and linker rules already ensure that dynamic libraries each get their own copies of their floating point constants, so by following the existing rules and only allowing per-JITDylib removal this case becomes safe.

There’s plenty more to talk about here, but it’s getting late so I’ll leave it here for tonight. Questions and comments on this approach and the initial patch are very welcome, especially from people who have use-cases for removing code. Barring any major objections I’m hoping that we can have this feature in-tree in the next week or so.

– Lang.

Hi Lang,

Nice work!
Once JITLink eventually replaces RuntimeDyld, will that enable more granular code removal?

Regards,

Machiel

Hi Machiel,

Once JITLink eventually replaces RuntimeDyld, will that enable more granular code removal?

The proposed scheme should already allow almost arbitrary* granularity: Any symbol that you want to be able to remove independently can be extracted into its own module and placed in its own JITDylib. However, JITLink may enable clients to develop specialized resource management schemes with lower overhead for some use-cases.

  • Almost arbitrary, because some formats contain relocations that force symbols to be linked together in the same object file.

Regards,
Lang.