Making LLVM play nice(r) when used as a shared library in a plugin setting

Hi all,

LLVM currently has (and has had for a long time) some issues when it’s used as a shared library in a plugin-like setting. I would like to fix them, and since the required changes are subtle with a wide effect I’m raising them here.

The issues I know of fall into two broad categories:

  1. Life-time and cleanup of global variables.
  2. Lack of isolation of command-line options and option values.

The second category is important but requires some major changes to the CommandLine library. These (and related) CommandLine issues have been brought up regularly over the years by different people and experience shows that it is a difficult topic that deserves its own treatment. I have some concrete ideas, but for this thread I want to focus on the first category, which is conceptually simpler.

The proposed changes are here:

Details of proposed changes

For the category of life-time and cleanup issues around global variables, there are two issues I am aware of:

  1. Cleanup of ManagedStatics
  2. Cleanup of cl::Option (cl::opt etc.)

ManagedStatics

ManagedStatics are currently added to a global linked list during construction, which is then walked for destruction during llvm_shutdown, which must be called explicitly.

There are various different angles of this which cause problems in the plugin setting:

  • There is no clear point at which llvm_shutdown can be called. A plugin does not know whether there are any other users of LLVM in the same process, so it cannot safely call llvm_shutdown. Even if there are no other users of LLVM in the process, it is possible for the plugin .so to be unloaded without unloading the LLVM .so. But if the plugin doesn’t call llvm_shutdown, then cleanup doesn’t happen and we get memory leaks.
  • If the plugin itself uses ManagedStatics, then the corresponding destructors may have to run at a different time than the ManagedStatics used by LLVM itself, since the plugin may be unloaded without LLVM itself being unloaded. Note that this issue affects LLVM itself when built with BUILD_SHARED_LIBS=ON (i.e., the not-really-supported build options in which every library is linked into its own shared object).

The solution is to use native language features in a standard way and simply add a destructor to ManagedStatic. The compiler and linker together ensure that the destructor is run at the correct time, i.e. .when the .so containing the ManagedStatic is unloaded.

Two observations:

  1. One side-effect of this change is that the order in which destructors are run may change. To support that, I am adding a ManagedStatic<T>::peek() method which returns a pointer to the static or nullptr if it has already been destroyed (or if it has never been constructed). The intention is to use this method when global destructors have dependencies between each other, to avoid re-constructing a static that has already been destroyed.

  2. At least LLD relies on being able to exit a process uncleanly for faster teardown, while still enjoying the side-effects of llvm_shutdown. This feels very underhanded to me, so my patch also introduces an explicit llvm_fast_shutdown function which explicitly invokes only the side-effects that LLD wants to have.

cl::Options

cl::Option is the base class for cl::opt and friends. It registers itself with the global options parser in the constructor, but currently does not unregister itself in the destructor.

This is an obvious problem because it leaves dangling pointers around when the plugin .so is unloaded but the LLVM .so isn’t. Furthermore, if the plugin is then loaded again, initializing the command-line options it carries fails because they’re already registered.

Again, the solution is to use native language features in a standard way and simply unregister command-line options there.

Aside: One may ask whether it is legitimate for a plugin to define its own command-line option, since plugins don’t “have access to” the command-line in any real sense, and one expects the process command-line to be parsed before any plugins are loaded.

My answer to that is yes. Plugins can be rather complex and consist of multiple libraries, some of which may also be used in command-line utitilies. For example, in our concrete use case we have an LLPC library which contains a graphics shader compiler built on LLVM. Our Vulkan driver links against the LLPC library, but we also have standalone command-line tools (mostly used during development) which link against the LLPC library.

Use Case

As stated, for the issues to occur you really need to do two things at once:

  • You need to use LLVM from a “plugin”, i.e. some shared object that gets dynamically loaded and unloaded during the runtime of a process.
  • The plugin needs to link against LLVM dynamically, so that (1) you may end up in a situation where LLVM is shared by multiple users within the same process and/or (2) the plugin may be unloaded without LLVM also being unloaded.

My conrete use case is graphics drivers. Drivers for all major graphics APIs are loaded at runtime when a graphics context is created, based on the GPU(s) present on the user’s system, and those drivers include a shader compiler (which is often built on LLVM).

Depending on what exactly an application is doing, the same driver may be loaded and unloaded multiple times during a process runtime – for example, because the application first probes which GPUs and drivers are present, and then creates a context only later on. The driver(s) need to be loaded during the probing phase and are then unloaded e.g. by the Vulkan loader.

Alternatives

Where LLVM is ultimately packaged in a binary product, static linking with symbol hiding is often an option. It bypasses all these issues. However, at least Linux distributions usually require that large libraries such as LLVM use shared linking, so we can’t always use this workaround.

I have looked for other alternatives, but ultimately I haven’t found any that really solve the problems. For example, at some point I considered registering llvm_shutdown as an atexit handler. The problem with this approach is that it doesn’t work when ManagedStatics are defined in different shared objects that are unloaded at different times.

Ultimately, the proposed changes move us more towards using native language features and following best practices, i.e., using constructors and destructors in a way where the destructor properly tears down all state associated with the object. All of that is a good thing in my view.

The only technical downside of the changes that I can see is that more global destructors are run during shared library unload/process exit. My view is that if process shutdown time is something that the community cares about for some tools (like, apparently, LLD), we should systematically bypass all process shutdown explicitly in those tools, via a combination of an explicit llvm_fast_shutdown() call and unclean exit.

Thank you for your consideration and any feedback (positive or negative). I’d be very happy about reviews to the changes themselves (D128166, D128167)!

I can think of a couple reasons we might want to avoid C++ destructors:

  1. Registering them can slow down startup.
  2. I’m not sure unloading a shared library runs C++ destructors reliably on all platforms.

Really, we probably shouldn’t be using ManagedStatic in most circumstances; like other global variables, it implies shared state, and we don’t really want to share state across different LLVMContexts in most cases.

I mostly agree about the use of global variables, but this feels orthogonal. Unless somebody has an actually viable idea of how to remove all global variable use in LLVM, we should still make sure that the ones we do have work correctly (at least as much as is within our power – if there really are broken platforms out there then that sucks, but it shouldn’t prevent correctness on platforms where correctness is possible).

There are currently roughly 120 ManagedStatic variables.

A few of them are used to compute tables that are constants, but inconvenient to compute at compile-time because they contain stuff like hashmaps. These can probably be switched to use static local variables, instead of ManagedStatic. Static local variables naturally have the properties we want: they don’t involve running any code at startup, and on targets where libraries can be unloaded, they’re naturally destroyed correctly. And if the value is constant, none of the state-sharing issues apply.

The rest are basically just wrong if you have multiple users of LLVM in the process: timing, debug logging, etc. We should really transition these to some other mechanism, e.g. some sort of timer context.

I’ll skip talking about options since we already talked about that recently…

2 Likes

Okay, I see what you mean. I had some concerns at first but after looking in a bit more detail, replacing ManagedStatic by function-local static variables seems feasible.

How about the following adjusted high-level plan:

  1. Remove ManagedStatic entirely in favor of function-local static variables plus the proposed llvm_fast_shutdown mechanism (for LLD’s unclean process exit path).
  2. Unregister cl::Options in their destructor.
  3. Address the isolation problems of command-line options somehow.
  4. Address the other isolation problems (timing, debug logging, etc.) gradually over time.

Is this something we can tentatively agree on? (1+2) should be sufficient to address the life-time problems. I want to fix (3) as well, but it’s a bigger (1000s of command-line options) and less-well defined problem that I expect needs much more discussion. I don’t think (1+2) should be blocked by (3) or (4).

1 Like

Step 1 seems reasonable.

For step 2, Option has a destructor anyway… so I guess that’s fine? At least, it’s not worse than anything we’re already doing.

MLIR has already gone through this transition and has eliminated all global ctors (e.g. from CommandLine options). Have you looked at its approach? @mehdi_amini can probably comment more about it.