[RFC] Internal command line options should not be statically initialized.

LLVM's internal command line library needs to evolve. We have an immediate need to build LLVM as a library free of static initializers, but before brute-force fixing this problem, I'd like outline the incremental steps that will lead to a desirable long term solution. We want infrastructure in place to provide an evolutionary path.

In the near term, clients who need llvm-the-library with no static initializers will build with LLVM_NO_STATICINIT. In this mode, existing users of cl::opt will default to being optimized away as constant initializers, and those options will not be available for command line parsing.

A new option class will need be defined, cl::toolopt. Initially, this will share the implementation and API with cl::opt. The only difference will be that cl::toolopt is immune to LLVM_NO_STATICINIT. Options that are required for tool support can simply be type-renamed to toolopt. Since these are not defined in a library, their static initializers are irrelevant.

Eventually, we would like to eliminate the special LLVM_NO_STATICINIT build mode. This can be done by making the -Asserts build just as strict. In the meantime, we would like to run as many unit tests as possible with LLVM_NO_STATICINIT builds. This will be solved by gradually moving cl::opt definitions buried within LLVM libraries to to a new pattern that avoids static initialization.

One easy pattern to follow is to register the option during pass initialization with all the convenient flags and parameters, but refer to a globally defined option storage that enforces the singleton and provides visibility. As long as pass initialization happens before parseCommandLine, usage should be consistent.

Strawman:

cl::optval<bool> MyOption; // Just the storage, no initialization.

MyPass() {
  // Only registers an option with the same optval once.
  Option cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
                         cl::desc("Descriptive string..."), );
}

-Andy

Hey Andy

One easy pattern to follow is to register the option during pass initialization with all the convenient flags and parameters, but refer to a globally defined option storage that enforces the singleton and provides visibility. As long as pass initialization happens before parseCommandLine, usage should be consistent.

Strawman:

cl::optval<bool> MyOption; // Just the storage, no initialization.

MyPass() {
// Only registers an option with the same optval once.
Option cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
                        cl::desc("Descriptive string..."), );
}

Given Chandler's upcoming work on the pass manager, should we assume that multithreaded passes are a future possibility. If so, would the above variable need to be static inside the constructor, or is there some better way to initialize it only once? Or perhaps cl options just don't make any sense in a multithreaded context and you can ignore my ramblings.

Thanks,
Pete

LLVM's internal command line library needs to evolve. We have an
immediate need to build LLVM as a library free of static
initializers, but before brute-force fixing this problem, I'd like
outline the incremental steps that will lead to a desirable long
term solution. We want infrastructure in place to provide an
evolutionary path.

In the near term, clients who need llvm-the-library with no static
initializers will build with LLVM_NO_STATICINIT. In this mode,
existing users of cl::opt will default to being optimized away as
constant initializers, and those options will not be available for
command line parsing.

A new option class will need be defined, cl::toolopt. Initially, this
will share the implementation and API with cl::opt. The only
difference will be that cl::toolopt is immune to LLVM_NO_STATICINIT.
Options that are required for tool support can simply be
type-renamed to toolopt. Since these are not defined in a library,
their static initializers are irrelevant.

Eventually, we would like to eliminate the special LLVM_NO_STATICINIT
build mode. This can be done by making the -Asserts build just as
strict. In the meantime, we would like to run as many unit tests as
possible with LLVM_NO_STATICINIT builds. This will be solved by
gradually moving cl::opt definitions buried within LLVM libraries to
to a new pattern that avoids static initialization.

One easy pattern to follow is to register the option during pass
initialization with all the convenient flags and parameters, but
refer to a globally defined option storage that enforces the
singleton and provides visibility. As long as pass initialization
happens before parseCommandLine, usage should be consistent.

Strawman:

cl::optval<bool> MyOption; // Just the storage, no initialization.

MyPass() {
  // Only registers an option with the same optval once.
  Option cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
                         cl::desc("Descriptive string..."), );
}

Sounds good to me.

Will this make it safe again to use -backend-option in Clang? [Not saying that we *want* to do that, but that's a separate matter].

Regardless of the answer to that question, it might make sense for multiple target backends to register options with the same name (currently, for example, both the X86 and PPC backends have options to force the use of a base pointer, and they need to have different names), it would be nice if that could be cleaned up as part of this.

-Hal

Thanks for giving me a chance to clarify and fix a typo.

Normally, you wouldn’t need to capture the Option object at all, just register it:

MyPass() {
cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
cl::desc(“Descriptive string…”), );

}

The globally defined option storage will have an initialized flag and registerOpt will do a CAS on that for thread safety. Subsequent calls to registerOpt with the same option storage will do nothing.

Now, we might want to capture the option object as such:

MyPass() {
Option &MyOpt = cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
cl::desc(“Descriptive string…”), );

}

Subsequent calls to registerOpt will just return the existing Option object.

-Andy

Hi Andy,

I definitely agree with the desire to remove command line options and
having them be initialized as part of the pass would be general
goodness. However, a few possible issues:

a) a number of command line options aren't really connected to passes
per-se (backend options)
b) "As long as pass initialization happens before parseCommandLine,
usage should be consistent." I'm thinking this isn't going to work for
the opt tool at least :slight_smile:

Thoughts?

-eric

Hi Andy,

I definitely agree with the desire to remove command line options and
having them be initialized as part of the pass would be general
goodness. However, a few possible issues:

a) a number of command line options aren't really connected to passes
per-se (backend options)

We don’t have to ban the old-style options. They can live-on indefinitely for experimental purposes, but if test cases need to use those options, they would need a REQUIRES: asserts line.

For options that we actually want to make available to tools (and general testing) we have a couple possibilities:

Create a hook that can be called before command line parsing, like initializeXXPass.

Raise those options into an API and define the command line interface at the tool level instead. This seems like something we want to do anyway, but I’m not proposing anything definite yet (I’m happy as long we’re moving in that direction). We may want to raise some options all the way to function attributes.

b) "As long as pass initialization happens before parseCommandLine,
usage should be consistent." I'm thinking this isn't going to work for
the opt tool at least :slight_smile:

Thoughts?

I think all the standard passes will be initialized up front. The problem would be with plugins using -load, but I fail to see how that can work with -help today anyway.

-Andy

I definitely agree with the desire to remove command line options and
having them be initialized as part of the pass would be general
goodness. However, a few possible issues:

a) a number of command line options aren't really connected to passes
per-se (backend options)

We don’t have to ban the old-style options. They can live-on indefinitely for experimental purposes, but if test cases need to use those options, they would need a REQUIRES: asserts line.

Enh, I'm in favor of banning them. Even though I use a few.

For options that we actually want to make available to tools (and general testing) we have a couple possibilities:

Create a hook that can be called before command line parsing, like initializeXXPass.

This seems meh as just a workaround for the current behavior. It'd
probably be a step forward, but might be more work than just doing...

Raise those options into an API and define the command line interface at the tool level instead. This seems like something we want to do anyway, but I’m not proposing anything definite yet (I’m happy as long we’re moving in that direction). We may want to raise some options all the way to function attributes.

something like this. I think this is the way to go. I'm not sure how
it would look other than passing option structs into a context, but
that might work?

b) "As long as pass initialization happens before parseCommandLine,
usage should be consistent." I'm thinking this isn't going to work for
the opt tool at least :slight_smile:

Thoughts?

I think all the standard passes will be initialized up front. The problem would be with plugins using -load, but I fail to see how that can work with -help today anyway.

Guess so. Seems weird, especially for slightly buggy cases like debug
info where most things happen during initialization. (Unfortunate, but
unraveling it is a pretty big project at the moment)

-eric

My understanding to this is limited. You can ignore this safely.

Can you introduce "before-all" and "after-all" passes that does nothing more than handling flags and setting program statuses? That would eliminate out-of-pass flags.

Isnt all the command line options only relevant to the driver, so if all the command line options are migrated to the driver, the library will be free from static initializers.

Doesnt this make it more cleaner ?

Thanks

Shankar Easwaran

Wait, I have a terrible idea. Why don’t we roll our own .init_array style appending section? I think we can make this work for all toolchains we support.

We’d have something like:

struct PODOptData {
const char *FlagName;
… // Common POD stuff, can be initialized at ParseCommandLine time.
};

LLVM_SECTION(“.llvmopt”)
PODOptData OptionRegisterer = { “foo_flag”, … };

I know the COFF magic to get the section bounds to form an array, and I know it exists for ELF, but I don’t know how to do it on Darwin.

So since you are rolling that then why don’t just use the simpler ARM ELF ABI for that? armv7a ABI on i386 and AArch64 ABI in amd64? That init-array are dead simple to use.

Wait, I have a terrible idea. Why don't we roll our own .init_array style
appending section? I think we can make this work for all toolchains we
support.

Andy and I talked about this, but I don't think its worth it. My opinion is:
1. For tool options (the top-level llc, opt, llvm-as etc. opts) it doesn't
matter.
2. For experimental options (options that we would be happy if they were
compiled out of a production compiler/JIT client/whatever), it doesn't
matter.
3. For backend options that need to always be available, lots of them
probably already need to get promoted to real API.
4. For the remaining options (ones that don't need to become API, but also
aren't purely experimental), many of them can probably easily be
initialized by some existing initialization hook (pass initialization,
target initialization).
5. There aren't enough options left not in those categories to motivate
some kind of clever solution.

Another way of looking at it is: the implicitly initialized option syntax
is really convenient for experimental options, but those are exactly the
ones that don't cause problems because we could be happy just compiling
them out. For almost everything else, the implicitly initialized "feature"
of llvm::cl isn't all that useful, and is in some cases actively harmful.

- Daniel

We'd have something like:

LLVM’s internal command line library needs to evolve. We have an
immediate need to build LLVM as a library free of static
initializers, but before brute-force fixing this problem, I’d like
outline the incremental steps that will lead to a desirable long
term solution. We want infrastructure in place to provide an
evolutionary path.

In the near term, clients who need llvm-the-library with no static
initializers will build with LLVM_NO_STATICINIT. In this mode,
existing users of cl::opt will default to being optimized away as
constant initializers, and those options will not be available for
command line parsing.

A new option class will need be defined, cl::toolopt. Initially, this
will share the implementation and API with cl::opt. The only
difference will be that cl::toolopt is immune to LLVM_NO_STATICINIT.
Options that are required for tool support can simply be
type-renamed to toolopt. Since these are not defined in a library,
their static initializers are irrelevant.

Eventually, we would like to eliminate the special LLVM_NO_STATICINIT
build mode. This can be done by making the -Asserts build just as
strict. In the meantime, we would like to run as many unit tests as
possible with LLVM_NO_STATICINIT builds. This will be solved by
gradually moving cl::opt definitions buried within LLVM libraries to
to a new pattern that avoids static initialization.

One easy pattern to follow is to register the option during pass
initialization with all the convenient flags and parameters, but
refer to a globally defined option storage that enforces the
singleton and provides visibility. As long as pass initialization
happens before parseCommandLine, usage should be consistent.

Strawman:

cl::optval MyOption; // Just the storage, no initialization.

MyPass() {
// Only registers an option with the same optval once.
Option cl::registerOpt(MyOption, cl::init(false), cl::Hidden,
cl::desc(“Descriptive string…”), );
}

Sounds good to me.

Will this make it safe again to use -backend-option in Clang? [Not saying that we want to do that, but that’s a separate matter].

Regardless of the answer to that question, it might make sense for multiple target backends to register options with the same name (currently, for example, both the X86 and PPC backends have options to force the use of a base pointer, and they need to have different names), it would be nice if that could be cleaned up as part of this.

The only solution I have for this is to raise both options into the target-independent API. Currently, that means adding it to TargetOptions and moving the flag to CommandLineFlags.h. I feel like the way to improve this is by redesigning TargetOptions to be less ad-hoc. It would be nice to define an option string in one place and allow the option to be overridden as a function attribute or command line -mllvm option. We should probably have a declarative option syntax for this like clang.

-Andy

I definitely agree with the desire to remove command line options and
having them be initialized as part of the pass would be general
goodness. However, a few possible issues:

a) a number of command line options aren't really connected to passes
per-se (backend options)

We don’t have to ban the old-style options. They can live-on indefinitely for experimental purposes, but if test cases need to use those options, they would need a REQUIRES: asserts line.

Enh, I'm in favor of banning them. Even though I use a few.

Sure. They’re convenient though. I’d rather have temporary experimental options then no test cases at all for some feature during its development.

For options that we actually want to make available to tools (and general testing) we have a couple possibilities:

Create a hook that can be called before command line parsing, like initializeXXPass.

This seems meh as just a workaround for the current behavior. It'd
probably be a step forward, but might be more work than just doing…

Fair enough. It’s a way to make options available for unit tests in -Asserts builds without being forced to declare the option in the tool API. The most attractive thing about this approach is that most experimental backend options could be easily converted. So even if it is just a workaround, it provides a lot of flexibility.

Raise those options into an API and define the command line interface at the tool level instead. This seems like something we want to do anyway, but I’m not proposing anything definite yet (I’m happy as long we’re moving in that direction). We may want to raise some options all the way to function attributes.

something like this. I think this is the way to go. I'm not sure how
it would look other than passing option structs into a context, but
that might work?

Ok, we all seem to want this. As I told Hal, we probably should have a declarative syntax that generates code for tool options, integrates function attributes and command line flags, and provides an central lookup with some reflection API. How we do this us not going to change how we solve the immediate problem of static initializers, and it will obviously take a lot of time to agree on the design. So I’ll declare it out of scope for now.

-Andy

Isnt all the command line options only relevant to the driver, so if all the command line options are migrated to the driver, the library will be free from static initializers.

Doesnt this make it more cleaner ?

Yes, but less convenient for developing experimental passes. I think we want to move in this direction, as I explained to Eric. We don’t have a good framework for tool options, and solving that problem will take a lot more time than what I’ve proposed so far.

-Andy

Dropping another idea: how about some Compiler-as-a-service paradigm?

That is, instead of statically initialise them, create a class for those internal state and allowing pushing and pooping them. The only remaining statically initialised object will be the state stack, whose statical initialising is reasonable.

LLVM’s internal command line library needs to evolve. We have an immediate need to build LLVM as a library free of static initializers, but before brute-force fixing this problem, I’d like outline the incremental steps that will lead to a desirable long term solution. We want infrastructure in place to provide an evolutionary path.

Thank you for tackling this, we should have fixed this years ago.

Please do a pass over the cl::opts we have, and remove ones that are long dead or unused. Do we still need -join-liveintervals? :slight_smile:

LLVM's internal command line library needs to evolve. We have an immediate
need to build LLVM as a library free of static initializers, but before
brute-force fixing this problem, I'd like outline the incremental steps
that will lead to a desirable long term solution. We want infrastructure in
place to provide an evolutionary path.

Thank you for tackling this, we should have fixed this years ago.

Please do a pass over the cl::opts we have, and remove ones that are long
dead or unused. Do we still need -join-liveintervals? :slight_smile:

Wait, I have a terrible idea. Why don't we roll our own .init_array
style appending section? I think we can make this work for all toolchains
we support.

Andy and I talked about this, but I don't think its worth it. My opinion
is:
1. For tool options (the top-level llc, opt, llvm-as etc. opts) it doesn't
matter.
2. For experimental options (options that we would be happy if they were
compiled out of a production compiler/JIT client/whatever), it doesn't
matter.
3. For backend options that need to always be available, lots of them
probably already need to get promoted to real API.
4. For the remaining options (ones that don't need to become API, but also
aren't purely experimental), many of them can probably easily be
initialized by some existing initialization hook (pass initialization,
target initialization).
5. There aren't enough options left not in those categories to motivate
some kind of clever solution.

I think that this is a great summary of the problem. Having cl::opt's
compiled *out* of non-assert build by default makes a lot of sense to me,
and having tool options use toolopt<> (or something) also makes perfect
sense.

If you're going to go and tackle pass-specific options, I think that we
should consider changing the syntax and overall design of the command line
options. We already have some manual name mangling/namespacification of
options (e.g. -tail-dup-limit=). Perhaps we should formalize this somehow?

In my work on LTO this summer, I kept getting a desire to be able to
"parameterize" passes to see how their behavior changes. One thing I wanted
to do related to some mailing list discussions was to try running a "light"
inlining pass at various stages, but AFAIK LLVM doesn't have a way to do
something like `opt ... -simple-inliner(40) ... -inline`. In one of
Shuxin's preliminary LTO experiment patches, in order to get a
SimpleInliner with threshold 40, he had to add C++ code (admittedly little,
but still requiring a recompile, and the threshold was hardcoded).

Another example where "pass parametrization" was a huge win that I ran into
this summer was with the ASan instrumentation pass that uses cl::opt's for
this (it has a LOT of cl::opt's btw). Being able to dynamically configure
the shadow offset and scale with the existing SDK compiler was a crucial
productivity win, so losing that would be a shame. Maybe there could be
something like a clang option to allow
`-configure-pass=asan(mapping-scale=...,offset-log=...)` that actually
associates that configuration with ASan? Idk (that's just off the top of my
head).

-- Sean Silva

Obviously, based on the 18 responses I’ve gotten, the tone of my first email was misleading.

I don’t want to stifle discussion, but to be clear, the only thing I propose to tackle immediately is the removal of static initializers from libraries. There are several isolated issues that Filip has found good workarounds for. cl::opt is the one pervasive problem that can’t be weeded out one case at a time.

The purpose of posting an RFC and opening up discussion was to find out from people who have already thought about this, how the ideal cl::opt framework should work. I won’t be making that happen, rather I’ll make sure that the changes we make don’t get in the way of future progress.

I would certainly love to see LLVM internal options be reorganized and help however I can, but I’ll be very sad if that holds up removing static initializers.

-Andy

=/ I think we should actually implement the right long-term design rather
than something short term.

Anyways, I feel like there are (at least) three possible problems you want
to solve here, and I'd like to understand which (maybe all) you're actually
trying to solve, and which ones seem most important:

1) threads still alive during program termination reading from flags that
are being destroyed along with all globals

2) initialization ordering issues between flags in different translation
units

3) the existence of (non-zero-initializing) static initializers at all

For me, #1 and #2 are things I care a lot about and would be happy to see
solved. But #3 doesn't seem necessary or even desirable. We have a lot of
registration patterns in LLVM that make working with it very simple and
easy. It's not clear why we would want to preclude this, or re-invent the
mechanisms that already exist to automatically trigger static
initialization with the arbitrary fan-out of 'initializeFoo' global
functions. So if #3 is really an important goal, I'm curious about the why.
=] This is especially relevant as it impacts all of the work I'm starting
to do on the pass management and registration system.

As a somewhat separable point, I completely agree that every flag which any
frontend actually needs to control for correct functionality should be
moved from flags to an actual, proper interface as global flags just don't
work for a library. Essentially, they should be "debugging" tools or
"developer" tools, not actual interfaces. This isn't true today, and the
most egregious cases are the emission of debug information. All of that is
controlled through global flags, which causes lots of problems today for
our library users of LLVM.

However, I don't think the flags should only be present in !NDEBUG builds.
I think its reasonable for developers to debug problems with released
binaries by causing these flags to be toggled using '-mllvm' or related
tools in the frontends to manually parse flags, or by 'opt' automatically
handing the flag parsing down to this layer.