[DISCUSS] Making Global cl::opt Friendly for JITing Hetro Computation

Dear LLVM community:

We would like to bring up a discussion about JITing and LLVM’s cl:opt mechanism.

LLVM have global configurations cl:opt that affects how the pass pipelines are being configured by default. Reading the code, one possible assumption was likely that the option will be set once and remain constant through out compilation.

This assumption likely comes from the case where compiler was used in a driver setting, for a single target backend.

As the field of compilation evolves, we start to increasing see the use-cases of JITing, where compilation is being embedded in a long running process through out a time span.

Example projects that uses LLVM in this way includes: Numba, PyTorch, Julia, Taichi, TVM and possible more that I cannot list here. The assumption that cl:opt being configured globally still likely hold for a default setting.

But things becomes more interesting as we compose things for Hetro environment, e.g. GPU. Compilation for GPU usually comes with two set of pipelines, one for the host(e.g. x86) that drives the computation, one for the device(e.g. nvptx) code.

Sometimes both paths reuses the same passes(e.g. loop unrolling), and then there is a need to stitch the end result together – all in the same process. This becomes problematic if we decide that the GPU path and host should use a different global configurations, e.g. max-unroll-count should be 100 for GPU but remains 0 for the host.

We bough up this discussion in the TVM community.

A0: One ppossible solution would be to simply not use LLVM as JITing engine, but simply use it as a one-time CLI, and reload LLVM for each compilation. This way the global cl::opt can get reset per compilation invocations, and of course defeats the purpose of using LLVM as a JIT library.

A1. The most ideal solution to this is likely having an API that can quickly pull out a default pipeline but still able to reconfigure certain passes in a way that is independent from the global cl::opt. I have limited understanding of the internally but my quick reading of the code seems to that it is a bit intertwined(but I am not an expert here and would love to see suggestions).

A2. Our last solution, is an enhanced workaround for the particular problem, is to record the cl::opt when entering a RAII scope, and reset them when exiting the scope. So different compilation pipeline can have different defaults for CPU and GPU pipeline.
It works as follows:

 void MyCodegen() {
    {
        With<LLVMCLOption<int>> scope("unroll-max-count", 2);
        // max-unroll-count set to 2 here, pipeline 1
        {
            With<LLVMCLOption<int>> scope("unroll-max-count", 3);
            // max-unroll-count set to 3 here
         }
        // max-unroll-count set to 2 here
     }
     // global option reset to default.
}

We would like to bring discussion to the LLVM community as JITing for Hetro Computation is only likely going to be increasingly popular. It is also likely going to be an issue that will be faced by other packages when they attempt to configure target specific pipelines differently.

It would great to get LLVM community’s thoughts on this matter.

Thanks

Usually cl::opt isn’t the right mechanism to control optimizations; global knobs don’t compose well. It’s just a convenient way to define options for debugging with a minimal amount of code. We have a variety of other mechanisms for modifying the behavior of transforms; it shouldn’t be an issue to use another mechanism in specific cases. Many optimization passes are modified by function attributes, or metadata on instructions. And the new pass manager has a general mechanism for passing flags to passes to modify the behavior of a pass (see llvm/lib/Passes/PassRegistry.def).

So if you’re having an issue with some specific flag, probably the answer is to use one of the mechanisms I just mentioned.


But I guess the issue here isn’t just a specific flag; the issue is that the existence of cl::opt mechanism is itself a hazard if a process has multiple users of LLVM.

I think it’s going to be hard to come up with a good solution that involves hacking at cl::opt directly; it’s used in so many diverse places that it’s going to be hard to migrate everything in one shot. As an incremental step, I think it makes sense to focus specifically on uses of cl::opt in llvm/lib. Which is mostly LLVM optimization passes and backends.

It might make sense to define a mechanism for global flags tied to LLVMContext. That’s not really a great way to define flags for anything, but almost all the relevant code in llvm/lib has access to one, so it would allow mechanically migrating LLVM transforms/backends to use that mechanism instead of cl::opt.

1 Like

It might make sense to define a mechanism for global flags tied to LLVMContext. That’s not really a great way to define flags for anything, but almost all the relevant code in llvm/lib has access to one, so it would allow mechanically migrating LLVM transforms/backends to use that mechanism instead of cl::opt.

+1, I think we can have a global registry of flags, which does not change after startup. We can parse and store the option values in some particular LLVMContext, and retrieve them from there.

1 Like

+1 : when a flag becomes more than a debugging / experimental thing, it should be promoted to a c++ API flag (added to a Pass constructor for example).

MLIR has successfully used cl::opt without any global state. We should look into how it could translate to LLVM as well!

1 Like

Thanks for the discussions so far and +1 a better path would be something like A1, where each pipeline can be configured perhaps independent from the global states(through starting from an initial configs and update).

One goal of the discussion is mainly to bring awareness to this issue. It would also be good to get thoughts around short-term bandaids. e.g. A2 is what we end up with without explicitly changing LLVM

+1 for trying to work in that direction for better composability in the ecosystem.

Part of the issue we keep running into occasionally is that even if we were to try to be perfect citizens (and not set any cl::opt flags ourselves), our code can be loaded into a process which uses LLVM and does set cl::opt flags for its own purposes. The result is that our compilations get compromised and we see bug reports, even though it’s “not our fault”.

The only viable production strategy for us today is to statically link against our own copy of LLVM whose symbols are hidden from the outside world. But of course, Linux distributions aren’t happy about this.

One possible evolutionary strategy that I’ve looked into is to tie cl::opts to a thread-local “options context”. This has been suggested by others in the past, and it seems to be the only feasible solution that can work without rewriting all users of cl::opt. (It still requires rewriting many users of cl::opt, specifically those that provide their own storage for the option value.)

I wonder how receptive folks would be to a refactoring initiative along those lines?

If I’m understanding correctly, you’re proposing thread_local “options context”. All options would store their value in that context, and we provide an API to let users explicitly switch to a fresh context. That’s a bit ugly, but it allows users to explicitly work around issues with specific projects. If you want to propose that, go ahead, I guess.

That doesn’t really give us an evolutionary path, though. Any option that wanted to actually compose correctly would have to move away from the “options context” to some other form of storage.

1 Like

Seems like another local optimum with its own suboptimal failure mode: for example that makes the option tied to a particular thread, and users would need to carefully save/restore the context on every thread switch.
The LLVMContext path seems a bit more appealing to me as this is already the unit of “threading isolation” that we rely on.

True, although the impact in practice can be mitigated with RIIA scope guards. It would also be easy to tie LLVMContexts to option contexts, as follows:

  • Allow an “option context” to be specified when creating an LLVMContext
  • At a few strategic points where an LLVMContext is available (e.g., PassManager::run), add an assertion that the current thread_local option context pointer matches the one referred to by the LLVMContext.

Together, these measures reduce the risk of getting the option contexts wrong to very close to zero.

There are plenty of command-line options where inherently no LLVMContext exists. That makes it impossible to just change cl::opt to require one as an argument to a getter method, for example. It’s not clear to me what alternative you have in mind?

It does not make it easy to manage though: every change of thread requires an update of the option context.
As an example: if a system with a thread pool receives tasks with a Module to compile for example, the task would need to always copy the option context from the LLVMContext to override the current thread_local one before doing anything.

Longterm you could transition from cl::opt to tablegen options and have the options be an input to LLVMContext.

It’s questionable what is easier to manage. If we don’t use TLS, the logical consequence is that every access to a cl::opt must use an explicit getter method to pass in a context. That is, all accesses to all cl::opt variables must be changed from a naked MyOption to something like MyOption.get(context). This does have the advantage of being more explicit, and I’d be fine with going down that path personally, but people need to be clear on the trade-off.

As I wrote before, the other part of it is that there are many uses of cl::opt for which no LLVMContext exists in the first place. Some sort of cl::Context class is needed either way, though of course one could provide convenience overloads of cl::opt::get() .

Part of the charm of cl::opt is its easy extensibility, including in places where an LLVMContext doesn’t make sense, as well as downstream projects.

Is this a feature or problem?

1 Like

What do you think? Make sure your answer encompasses all the use cases of cl::opt. It would also help for you to consider alternatives.

I’m firmly in the camp: “it is a problem”. While a convenient easy to implement and easy to add new cl::opt and assemble the system, it comes with too much drawbacks, in particular it relies overly heavily on static global constructors.
I don’t believe this is necessary though, but the system needs to account for it and provide some way to explicitly register options at every level. Of course that will cost some more boilerplate than the “zero” boilerplate we get from relying on the linker to thread everything into main, but the system would be much cleaner IMO.

Providing some of my thoughts reading the code. Different ways of organizing configs comes with their own cost of extensibility.

Organizing things in a centralized location(e.g. context) via coded fields, would effectively mean that the extending the options would touch the centralized location, making extending passes slightly harder.

Organizing things in a de-centralized location(co-located with passes) means easier extensibility of adding new passes that comes with new options.

Based on my read, the cl::opt served as two purposes:

  • N0: the ability to delcare the options available for a pass, and their default values to a centralized registry.
  • N1: the ability to be able to query the settings.

The additional needs comes when

  • N2: The ability to be able to construct options from the set of default values, and set them through command line, or other functions in a pass-specific config (either TLS or context).

While the needs are coupled atm, they do not necessarily have to be the case. For example, one can imagine declaring cl::opt statically to be able to register the options available and their default ones (N0), while still be able to clone those cl::opt then mutate in pass specific settings, perhaps by collecting/copying Map<String, cl::Option>, then do some inplace mutation by string lookup. There can also be duplications/updates through the string key, assuming that key is global to the project.

See the following mock up example on one way to de-couple N0, N1 and N2.

// declares the option, N0
static cl:: opt<int> max_unroll_factor("max-unroll-factor", ...);

void runPass() {
    OptContextMap opt_map = CollectDefaultsOpt();
    // N2: update the options
    // using max_unroll_factor as key(implied by the key "max-unroll-factor")
    opt_map[max_unroll_factor] = new_value;

    loopUnroll(opt_map);
}

void loopUnroll(OptContextMap opt_map) {
    // N1: keyed by max-unroll-factor
    int max_unroll_factor = opt_map.get(max_unroll_factor);
}
1 Like

We’ve had issues in the past with cl::opt and LTO; we were passing in a -mllvm <foo> flag to the compiler but not the linker before using LTO. After enabling LTO, the flag was silently being dropped! There’s no -Wununused-command-line-argumentfor-mllvmor-Xclang` flags, IIRC. You only tend to find out the hard way after something goes wrong as a result, if at all.

Thanks @tqchen for teasing apart one aspect of the topic in this way, that’s really helpful for the discussion.

Another aspect is the raw engineering changes that have to be made to get a different solution. There are a lot of cl::opt users throughout llvm-project (and the wider ecosystem), and changing them all is a pretty Herculean task, especially if it needs to be done by hand. The crucial issues I’m seeing are:

  1. Eliminate the possibility of specifying “external storage” for cl::opts, as the external storage is necessarily global and we’re agreed we should get away from that.
  2. Passing some form of “context” object to all users of cl::opt.

(The proposal of using thread_local storage should be understood as simply a technique that avoids massive amounts of code churn for point 2.)

Yet another aspect is the set of problems we’re trying to solve.

  1. Using cl::opts can cause breakage if options aren’t set consistently for LTO.
  2. Trying to use different cl::opt settings in multi-threaded scenarios doesn’t work.
  3. Even if you are not using cl::opts, cl::opts can still cause miscompiles for you if somebody else uses them in the same process.

The first problem is clearly an issue that should be solved using some cl::opt-based mechanism, even if it’s only a mechanism that warns if there is an option mismatch.

The second problem is of questionable legitimacy, although if we’re touching cl::opt anyway, there may be ways to ease the transition to a better solution.

The third problem is the one I’m primarily interested in.

It would be good if people who might be able to justify spending some time in this area could make progress without necessarily having to solve all those problems at once (because that makes it significantly harder to justify spending time in this area…)

There’s also a sort of meta problem which is that many of the purposes for which people use cl::opts would be better served using metadata or function string attributes, but the cl::opt mechanism is too convenient in comparison.