Making LLVM play nice(r) when used as a shared library in a plugin setting

Hi all,

LLVM currently has (and has had for a long time) some issues when it’s used as a shared library in a plugin-like setting. I would like to fix them, and since the required changes are subtle with a wide effect I’m raising them here.

The issues I know of fall into two broad categories:

  1. Life-time and cleanup of global variables.
  2. Lack of isolation of command-line options and option values.

The second category is important but requires some major changes to the CommandLine library. These (and related) CommandLine issues have been brought up regularly over the years by different people and experience shows that it is a difficult topic that deserves its own treatment. I have some concrete ideas, but for this thread I want to focus on the first category, which is conceptually simpler.

The proposed changes are here:

Details of proposed changes

For the category of life-time and cleanup issues around global variables, there are two issues I am aware of:

  1. Cleanup of ManagedStatics
  2. Cleanup of cl::Option (cl::opt etc.)

ManagedStatics

ManagedStatics are currently added to a global linked list during construction, which is then walked for destruction during llvm_shutdown, which must be called explicitly.

There are various different angles of this which cause problems in the plugin setting:

  • There is no clear point at which llvm_shutdown can be called. A plugin does not know whether there are any other users of LLVM in the same process, so it cannot safely call llvm_shutdown. Even if there are no other users of LLVM in the process, it is possible for the plugin .so to be unloaded without unloading the LLVM .so. But if the plugin doesn’t call llvm_shutdown, then cleanup doesn’t happen and we get memory leaks.
  • If the plugin itself uses ManagedStatics, then the corresponding destructors may have to run at a different time than the ManagedStatics used by LLVM itself, since the plugin may be unloaded without LLVM itself being unloaded. Note that this issue affects LLVM itself when built with BUILD_SHARED_LIBS=ON (i.e., the not-really-supported build options in which every library is linked into its own shared object).

The solution is to use native language features in a standard way and simply add a destructor to ManagedStatic. The compiler and linker together ensure that the destructor is run at the correct time, i.e. .when the .so containing the ManagedStatic is unloaded.

Two observations:

  1. One side-effect of this change is that the order in which destructors are run may change. To support that, I am adding a ManagedStatic<T>::peek() method which returns a pointer to the static or nullptr if it has already been destroyed (or if it has never been constructed). The intention is to use this method when global destructors have dependencies between each other, to avoid re-constructing a static that has already been destroyed.

  2. At least LLD relies on being able to exit a process uncleanly for faster teardown, while still enjoying the side-effects of llvm_shutdown. This feels very underhanded to me, so my patch also introduces an explicit llvm_fast_shutdown function which explicitly invokes only the side-effects that LLD wants to have.

cl::Options

cl::Option is the base class for cl::opt and friends. It registers itself with the global options parser in the constructor, but currently does not unregister itself in the destructor.

This is an obvious problem because it leaves dangling pointers around when the plugin .so is unloaded but the LLVM .so isn’t. Furthermore, if the plugin is then loaded again, initializing the command-line options it carries fails because they’re already registered.

Again, the solution is to use native language features in a standard way and simply unregister command-line options there.

Aside: One may ask whether it is legitimate for a plugin to define its own command-line option, since plugins don’t “have access to” the command-line in any real sense, and one expects the process command-line to be parsed before any plugins are loaded.

My answer to that is yes. Plugins can be rather complex and consist of multiple libraries, some of which may also be used in command-line utitilies. For example, in our concrete use case we have an LLPC library which contains a graphics shader compiler built on LLVM. Our Vulkan driver links against the LLPC library, but we also have standalone command-line tools (mostly used during development) which link against the LLPC library.

Use Case

As stated, for the issues to occur you really need to do two things at once:

  • You need to use LLVM from a “plugin”, i.e. some shared object that gets dynamically loaded and unloaded during the runtime of a process.
  • The plugin needs to link against LLVM dynamically, so that (1) you may end up in a situation where LLVM is shared by multiple users within the same process and/or (2) the plugin may be unloaded without LLVM also being unloaded.

My conrete use case is graphics drivers. Drivers for all major graphics APIs are loaded at runtime when a graphics context is created, based on the GPU(s) present on the user’s system, and those drivers include a shader compiler (which is often built on LLVM).

Depending on what exactly an application is doing, the same driver may be loaded and unloaded multiple times during a process runtime – for example, because the application first probes which GPUs and drivers are present, and then creates a context only later on. The driver(s) need to be loaded during the probing phase and are then unloaded e.g. by the Vulkan loader.

Alternatives

Where LLVM is ultimately packaged in a binary product, static linking with symbol hiding is often an option. It bypasses all these issues. However, at least Linux distributions usually require that large libraries such as LLVM use shared linking, so we can’t always use this workaround.

I have looked for other alternatives, but ultimately I haven’t found any that really solve the problems. For example, at some point I considered registering llvm_shutdown as an atexit handler. The problem with this approach is that it doesn’t work when ManagedStatics are defined in different shared objects that are unloaded at different times.

Ultimately, the proposed changes move us more towards using native language features and following best practices, i.e., using constructors and destructors in a way where the destructor properly tears down all state associated with the object. All of that is a good thing in my view.

The only technical downside of the changes that I can see is that more global destructors are run during shared library unload/process exit. My view is that if process shutdown time is something that the community cares about for some tools (like, apparently, LLD), we should systematically bypass all process shutdown explicitly in those tools, via a combination of an explicit llvm_fast_shutdown() call and unclean exit.

Thank you for your consideration and any feedback (positive or negative). I’d be very happy about reviews to the changes themselves (D128166, D128167)!

I can think of a couple reasons we might want to avoid C++ destructors:

  1. Registering them can slow down startup.
  2. I’m not sure unloading a shared library runs C++ destructors reliably on all platforms.

Really, we probably shouldn’t be using ManagedStatic in most circumstances; like other global variables, it implies shared state, and we don’t really want to share state across different LLVMContexts in most cases.

I mostly agree about the use of global variables, but this feels orthogonal. Unless somebody has an actually viable idea of how to remove all global variable use in LLVM, we should still make sure that the ones we do have work correctly (at least as much as is within our power – if there really are broken platforms out there then that sucks, but it shouldn’t prevent correctness on platforms where correctness is possible).

There are currently roughly 120 ManagedStatic variables.

A few of them are used to compute tables that are constants, but inconvenient to compute at compile-time because they contain stuff like hashmaps. These can probably be switched to use static local variables, instead of ManagedStatic. Static local variables naturally have the properties we want: they don’t involve running any code at startup, and on targets where libraries can be unloaded, they’re naturally destroyed correctly. And if the value is constant, none of the state-sharing issues apply.

The rest are basically just wrong if you have multiple users of LLVM in the process: timing, debug logging, etc. We should really transition these to some other mechanism, e.g. some sort of timer context.

I’ll skip talking about options since we already talked about that recently…

2 Likes

Okay, I see what you mean. I had some concerns at first but after looking in a bit more detail, replacing ManagedStatic by function-local static variables seems feasible.

How about the following adjusted high-level plan:

  1. Remove ManagedStatic entirely in favor of function-local static variables plus the proposed llvm_fast_shutdown mechanism (for LLD’s unclean process exit path).
  2. Unregister cl::Options in their destructor.
  3. Address the isolation problems of command-line options somehow.
  4. Address the other isolation problems (timing, debug logging, etc.) gradually over time.

Is this something we can tentatively agree on? (1+2) should be sufficient to address the life-time problems. I want to fix (3) as well, but it’s a bigger (1000s of command-line options) and less-well defined problem that I expect needs much more discussion. I don’t think (1+2) should be blocked by (3) or (4).

1 Like

Step 1 seems reasonable.

For step 2, Option has a destructor anyway… so I guess that’s fine? At least, it’s not worse than anything we’re already doing.

MLIR has already gone through this transition and has eliminated all global ctors (e.g. from CommandLine options). Have you looked at its approach? @mehdi_amini can probably comment more about it.

1 Like

Yes, some variant of what MLIR does makes sense. It doesn’t actually eliminate global state and therefore is really more of an optimization, not a correctness fix, but I think the basic idea is likely to play a role :slight_smile:

I have now uploaded a new series of changes that does step 1+2. It is complete for the llvm subdirectory of the monorepo, and I plan to address other users of ManagedStatic in the monorepo as well, assuming folks are aligned with this approach. The changes culminate in the complete removal of ManagedStatic via ⚙ D129134 ManagedStatic: remove the last traces.

Please review, and refer to the “Stack” tab on Phabricator for the whole series.

Some high-level notes:

  • Most uses of ManagedStatic are transformed in a straight-forward way into static variables at function scope – either in the only function that uses the static, or in a dedicated “getter” function.
  • Some uses of ManagedStatic are more complex and involve a number of related statics. In several cases, I combined those so that only a single static variable remains. This ensures that destructors are run in an easy-to-understand order to avoid bugs related to destructor ordering.
  • There were a number of uses of ManagedStatic<cl::opt<...>> which I’ve generally transformed to follow the pattern established in MLIR: put the options into a struct in an anonymous namespace of which we then (lazily) create a single global instance.
  • llvm_shutdown() is removed entirely, though I’ve kept LLVMShutdown() in the C API as a deprecated no-op.
  • llvm_fast_shutdown() is added as in the original proposal; it’s currently only used by LLD

Skimmed the patches; I think the changes make sense at a high level. I’ll try to review in more detail later.

1 Like

Question about this: are there any clients that want to shutdown llvm before the end of process? It seems like you’re removing the ability to tear down global state here. Why can’t share library/plugin clients call llvm_shutdown() before unloading the llvm plugin?

How does it affects the shutdown time of a clang process? I seem to remember that clang’s driver invokes cc1 with a flag disabled freeing memory to speed up the tear down of the process for example, similarly are we adding more global destructors to run than strictly necessary when the process terminates?

Shutting down LLVM without also unloading LLVM .text and .data (whether via dlclose() or via process exit) is fundamentally and unfixably broken. The unsolvable problem with it is hidden in the assumption that there is “the” llvm plugin. What if a process loads two plugins that use LLVM? This can result in one of the plugins being unloaded without LLVM being unloaded because the other plugin remains. So a plugin simply cannot decide whether it is safe to shutdown LLVM. The dynamic linker is the correct place for making the decision.

This happens in real life, which is what led me down this path in the first place. For example, a graphical application may load both an OpenGL and a Vulkan driver with LLVM-based shader compilers. Or with an application like KDevelop which uses libclang but may also render using an OpenGL driver. Furthermore, it’s possible that an LLVM-based plugin is unloaded via dlclose() but LLVM itself remains loaded. If another or the same plugin is then later loaded, it finds LLVM in a corrupt state.

I should add that this is a good reason to move away from global state as much as we can: if everything was owned indirectly by an LLVMContext, we didn’t have this problem. The proposed changes are about fixing bugs for the small amount of global state and one-time-initialization tables that we do still have.

I haven’t managed to reliably measure a difference, but obviously different systems might behave differently.

With the proposed changes, clients that are worried about this can use llvm_fast_shutdown + sys::Process::Exit(RetCode, true). This actually does less work than what was available before, because llvm_fast_shutdown does less work than llvm_shutdown.

Sorry, I missed your last question. The changes don’t add any new destructors being run. Just a whole bunch of destructors that were previously run via llvm_shutdown are now run via the __cxa_atexit mechanism. The changes do fix a missing teardown of cl::Options, but that destructor was non-trivial to start with.

I’m not sure how to understand this in the context of a process that it possible to tear down a process without llvm_shutdown at all so far?

Right, but what I was getting at is that a client that is not calling llvm_shutdown right now will see more destructors called because they are now part of __cxa_atexit .
It seems that such clients ought to call sys::Process::Exit(RetCode, true); (assuming they also are OK skipping non-LLVM global constructors), so it’s not a big deal but we may document it somewhere? (and add this to clang…)

Thank you for the first reviews.

Yes, that’s true. I’ll see about adding some documentation on this.

Gentle nudge and a notification that I’ve added the documentation for llvm_fast_shutdown (⚙ D129119 Add llvm::llvm_fast_shutdown()) as well as a patch to use it from the Clang driver (⚙ D130577 clang-driver: use llvm_fast_shutdown).

I’d very much appreciate reviews of those changes as well as the entire stack of which they are a part! Most of the changes are individually fairly small.