[RFC] Removing static initializers for command line options

I’m somewhat surprised that this comes up much in a context where you can’t extract a test case and play with it using ‘opt’ or some other stand-alone context.

If these come up so rarely, would it be unreasonable to just flip the flag in the source code, and build a DSO to test with? For example, this is how I have done counter-based bisection and combine-based bisection of things (a similarly rare but necessary activity I suspect) and it seems to work well.

-Chandler

One interesting issue with moving away from the current system of static initializers for cl::opt is that we will no longer have the automatic registration of all the options so that -help will print everything available and generally we will not be able to issue an error for an “unknown command line option” (without calling into any other code).

The auto-registration is fundamentally tied with the globalness and the static initializers; pondering this has led me down an interesting path that has made me understand better my suggestion in the other thread. As I see it, there are two very different sorts of uses of llvm::cl in LLVM:

  1. For regular command line processing. E.g. if a tool accepts an output file, then we need something that will parse the argument from the command line.

  2. As a way to easily set up a conduit from A to B, where A is the command line and B is some place “deep” inside the LLVM library code that will do something in response to the command line.

(and, pending discussion, someday point A might include a proper programmatic interface (i.e. in a way other than hijacking the command line processing))

llvm::cl does a decent job for #1 and that is what it was designed for AFAICT; these uses of llvm::cl live outside of library code and everything is pretty happy, despite them being global and having static initializers.

The problem is that llvm::cl is not very well-suited to #2, yet it is used for #2, and that is the real problem. We need a solution to problem #2 which does not use llvm::cl. Thus, I don’t think that the solution you propose here is the right direction.

The first step is to clearly differentiate between #1 and #2. I will say “command line options” for #1 and “configuration/tweak points” for #2. (maybe “library options” is better for #2; neither is perfect terminology)

The strawman I suggested in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075503.html was a stab at #2. There is no way to dodge being stringly typed since command lines are stringly typed, so really it is just a question of how long a solution stays stringly typed.

My thought process for staying stringly typed “the whole time” (possibly with some caching) comes from these two desires:

  • adding a c/t point should require adding just one call into the c/t machinery (this is both for convenience and for DRY/SPOT), and
  • this change should be localized to the code being configured/tweaked
    This is the thought process:

Note that llvm::cl is stringly typed until it parses the options. llvm::cl gives the appearance of a typed interface because it uses static initialization as a backdoor to globally transport the knowledge of the expected type to the option parsing machinery (very early in the program lifetime). Without this backdoor, we need to stay stringly typed longer, at least until we reach the “localized” place where the single call into the c/t machinery is made; this single call is the only place that has the type information needed for the c/t value to become properly typed. But there is no way to know how long it will be until we reach that point (or even if we reach that point; consider passes that are not run on this invocation).

Hence my suggestion of just putting a stringly typed key-value store (or whatever) in an easily accessible place (like LLVMContext), and just translating any unrecognized command line options (ones that are not for #1) into that stringly typed storage.

I agree with Rafael that “constructor arguments to passes” are not c/t points. In the future, there might be some way to integrate the two (from the referenced post, you can probably tell that I kind of like the idea of doing so), but for now, I think the clear incremental step is to attack #2 and solve it without llvm::cl. I have suggested a way to do this that I think makes sense.

– Sean Silva

To be clear: I agree with Rafael that we need to tread very carefully about how we expose this machinery in the C API, if we expose it at all. My suggestion is completely orthogonal to this though; all I’m talking about is how to avoid the static constructors and global state caused by the cl::opt’s in library code, which as I understand it is the motivation for the OP.

– Sean Silva

> There are two reasons this doesn’t work:
>
> (1) Cases where I might want to set a debug variable for the WebKit JIT
but not for Mesa - if the option storage is global it will get overwritten
for all users

This really comes up? Really, we are not talking -O2/-O3 or inlining
thresholds. We are talking things like lcr-max-depth being needed for
a debugging session.

> (2) cl::ParseCommandLineOptions today is C++ API, WebKit restricts
itself to the C API, so at the least we’d need to expose it as part of the
C API

This seems a much smaller change than adding a LLVMConfig object.

>> * We don't need a LLVMConfig object that gets passed around.
>
> For the sake of this discussion, let’s just scrap my phase two idea for
a configuration object and treat that as a separate issue.

But it makes a big difference on how the first phase is handled. If we
don't want the LLVMConfig object, the first phase should really just
remove the static constructors and not add a stringly typed interface.

> I think that there are advantages to a string-based interface. Sean
Silva actually suggested that interface when I first sent out an email
about our plans to rework command line options back on 8/6. Based on Sean’s
feedback and a few discussions I had offline with other LLVM contributors I
thought a stringly typed interface was the best approach to eliminating
both the static constructors and the static storage which are explicit
goals for our project.

Sorry I missed the original discussion. Sean, would you mind
summarizing the why of the stringly interface? Even if we do need a
LLVMConfig object (seems unlikely), I would still suggest using some
other key format. With strings we would suddenly be exposing every
command line option to the C API, which seems highly undesirable.

I think that what/how/if we expose in the C API is orthogonal to my
suggestions, which are more about removing the static initializers and
global state. The stringly typed interface is motivated by the need to
late-resolve which options there are and what their types are; this need
itself arises from eliminating the static initializers which serve as a
backdoor for globally propagating type and option information on startup.

For more details, see my lengthy musings that I just replied to the OP.

Personally, I am very wary of exposing a stringly typed interface in the C
API.

-- Sean Silva

I actually disagree with this for two reasons.

The first is that if there are going to be changes to the code anyway to remove static initializers, and we can move the storage to the pass at the same time, the we should make just one change and not two.

The second reason is that in most cases these knobs should not be globals. If I had a piece of data (not a CL::opt) in global scope, only used by one pass, then I’m sure people would question why it’s a global at all and move it inside the class. We’re treating cl::opt as special here when there’s no reason for the opt or the storage to be global.

We frown upon the use of globals, otherwise LLVM would be littered with them like many other C++ code bases. I don’t think cl::opts should be special at all in this respect.

Thanks
Pete

FWIW, I largely agree with Rafael's position here, at least in the near
term.

The nice thing about it is that even if we don't stay there forever, it is
a nice incremental improvement over the current state of the world, and we
can actually be mindful going forward of whether the restriction it imposes
(an inability to use "internal" knobs from within a library context that
have multiple different library users in a single address space) proves to
be a significant on-going burden.

I actually disagree with this for two reasons.

The first is that if there are going to be changes to the code anyway to
remove static initializers, and we can move the storage to the pass at the
same time, the we should make just one change and not two.

No one is suggesting otherwise that I have seen? At least, my
interpretation (perhaps incorrect, I've not had time to read all of this
thread in 100% detail) was that the removal of static initializers wouldn't
require changing every cl::opt variable.

The second reason is that in most cases these knobs should not be globals.
If I had a piece of data (not a CL::opt) in global scope, only used by one
pass, then I'm sure people would question why it's a global at all and move
it inside the class. We're treating cl::opt as special here when there's no
reason for the opt or the storage to be global.

We frown upon the use of globals, otherwise LLVM would be littered with
them like many other C++ code bases. I don't think cl::opts should be
special at all in this respect.

Sure, you're arguing against the core design of cl::opt. However, we have
it, and it wasn't an accident or for lack of other patterns that we chose
it.

For example, we don't require all constants to be per-pass, and instead
have per-file constants. Rafael is suggesting that one use case for cl::opt
global variables is, in essence, a constant that is somewhat easier for a
developer of LLVM (*not* a user) to change during debugging and active
development. I don't think the desire for convenience and only supporting
the default value in production contexts are completely invalid.

Once you factor those in, we have a tradeoff. Historically, the tradeoff
was made in the direction of convenience, relying on a very narrow
interpretation of the use cases. It isn't clear to me that we should,
today, make a different tradeoff. It certainly doesn't make sense why you
would gate removing static initializers (a clear win) on forcing a change
on a core design pattern within LLVM which not all of the developers are
really supportive of (at this point).

2. As a way to easily set up a conduit from A to B, where A is the command
line and B is some place "deep" inside the LLVM library code that will do
something in response to the command line.

I never liked this as a solution to #2. Heck, I never liked that we do
have #2 in the first place.

Hence my suggestion of just putting a stringly typed key-value store (or
whatever) in an easily accessible place (like LLVMContext), and just
translating any unrecognized command line options (ones that are not for #1)
into that stringly typed storage.

I fully support this idea, and is in line with my strawman proposal
for FPU/CPU parsing on all tools shared (beyond the boundaries of
LLVM).

String parsing is common and, even being target specific or pass
specific, is still string parsing and should be identical to all users
of the *same* feature.

I also believe a more sane command line option scheme is in order.
Today we have a zillion of options that are completely disconnected,
documented by accident in the initializer, without any bigger context
whatsoever. This is in part to follow what GCC has always done, and
probably we'll still need to support GCC's and our own legacy for
decades, but all that can also live in this commoned up parser.

Specifically to #2, my idea was something like:
--vectorizer-opts="foo,bar,baz=10" --tbaa-opts="...", etc. That could
use a common parser all the way down to parsing "foo", which would be
left by the vectorizer's back-end to the parser to deal with and setup
the right flags in the right structure, used because "vectorizer" in
"vectorizer-opts" tell the factory to return a vectorizer parser's
back-end.

The FPU/CPU parsing (which has to parse command line options and
assembly directives, which happens to have the same syntax), would
have a similar structure.

Such flags in LLVMContext (or whatever) would have to be structured
like a tree and each pass should receive its own tree root, which most
of the time would have just a list of key/values, but some times have
a more nested structure.

cheers,
--renato

I have no problem with this either.

As an alternative, and one which is likely to be fairly stable, we could just expose ParseCommandLineOptions to the C API. That parsing will cause the options to change anyway, and there’s actually nothing to stop you from calling it multiple times to change different options at different times.

Pete

One interesting issue with moving away from the current system of static initializers for cl::opt is that we will no longer have the automatic registration of all the options so that -help will print everything available and generally we will not be able to issue an error for an “unknown command line option” (without calling into any other code).

Not automatic no, but in the proposal Chris puts the addOption call inside the pass initializer which is called before ParseCommandLineOptions. This means you’ll still get options listed as you currently do, so long as you continue to calls the pass initializers before parse (something you have to do anyway to get the pass name visible to the command line)

The auto-registration is fundamentally tied with the globalness and the static initializers; pondering this has led me down an interesting path that has made me understand better my suggestion in the other thread. As I see it, there are two very different sorts of uses of llvm::cl in LLVM:

  1. For regular command line processing. E.g. if a tool accepts an output file, then we need something that will parse the argument from the command line.

  2. As a way to easily set up a conduit from A to B, where A is the command line and B is some place “deep” inside the LLVM library code that will do something in response to the command line.

(and, pending discussion, someday point A might include a proper programmatic interface (i.e. in a way other than hijacking the command line processing))

That would be nice. I just suggested in another thread that we expose ParseCommandLineOptions to the C API to hack around this, but a nice clean interface would of course be better.

llvm::cl does a decent job for #1 and that is what it was designed for AFAICT; these uses of llvm::cl live outside of library code and everything is pretty happy, despite them being global and having static initializers.

The problem is that llvm::cl is not very well-suited to #2, yet it is used for #2, and that is the real problem. We need a solution to problem #2 which does not use llvm::cl. Thus, I don’t think that the solution you propose here is the right direction.

The first step is to clearly differentiate between #1 and #2. I will say “command line options” for #1 and “configuration/tweak points” for #2. (maybe “library options” is better for #2; neither is perfect terminology)

The strawman I suggested in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075503.html was a stab at #2. There is no way to dodge being stringly typed since command lines are stringly typed, so really it is just a question of how long a solution stays stringly typed.

My thought process for staying stringly typed “the whole time” (possibly with some caching) comes from these two desires:

  • adding a c/t point should require adding just one call into the c/t machinery (this is both for convenience and for DRY/SPOT), and

Right. The current point is in the pass initializer.

There is another point in the pass constructor to read the option value. This is the only point at which something will change from being a string to its actual type and value.

  • this change should be localized to the code being configured/tweaked
    This is the thought process:

Note that llvm::cl is stringly typed until it parses the options. llvm::cl gives the appearance of a typed interface because it uses static initialization as a backdoor to globally transport the knowledge of the expected type to the option parsing machinery (very early in the program lifetime). Without this backdoor, we need to stay stringly typed longer, at least until we reach the “localized” place where the single call into the c/t machinery is made; this single call is the only place that has the type information needed for the c/t value to become properly typed. But there is no way to know how long it will be until we reach that point (or even if we reach that point; consider passes that are not run on this invocation).

The current proposal exposes the type in addOption (as well as later when we get the option). So the type continues to be known to the command line parser. Whether you want to actually type check in the command line is a point i’m open to discuss. Personally i want a command line option to be type checked because it was registered, even if no-one actually gets the value of the option later.

Hence my suggestion of just putting a stringly typed key-value store (or whatever) in an easily accessible place (like LLVMContext), and just translating any unrecognized command line options (ones that are not for #1) into that stringly typed storage.

I’m against it being in the context because you may want to set up and reuse passes multiple times with the same options, and use that configuration to compile multiple LLVMContexts. But I do agree that having a store with some lifetime is useful.

I think the current proposal is to have the store be a singleton, but there’s nothing to stop further work to have the storage for options be one per thread for example. If you wanted to have one pass manager per thread with its own set of passes, configured (currently) via their own call to ParseCommandLineOptions then that would be possible with little work beyond the current proposal.

I agree with Rafael that “constructor arguments to passes” are not c/t points. In the future, there might be some way to integrate the two (from the referenced post, you can probably tell that I kind of like the idea of doing so), but for now, I think the clear incremental step is to attack #2 and solve it without llvm::cl. I have suggested a way to do this that I think makes sense.

If you change the current proposal so that it doesn’t read cl::opt, then I think this reads to me like what is being proposed now. Really its creating a string->string map with addOption, and getting the values with getOption. The passes don’t care (or know) whether the options are set via the command line or any other API. I hope i’ve understood your proposal correctly here. Please correct me otherwise.

Thanks,
Pete

I think I may be misunderstanding you here. If I understand Rafael correctly he wants all the option storage to be done using the cl::opt external storage capabilities, so that the option values are stored globally.

My original proposal was to leave the storage in cl::opt, but to move the cl::opt to being owned, and have a keyed lookup. My plan was geared around being able to change one cl::opt at a time, but ultimately to get rid of the static initializers you do need to change every cl::opt variable so that they aren’t global.

As part of this work I did want to eliminate the global storage for all these options in favor of having the data stored in the passes. It seems that this last is the contentious part, which is what Pete is talking about WRT the use of globals.

I’m also not sure how Rafael’s proposal will work with eliminating static initializers for cl::opts with class or list typed data.

I don’t think that my proposal adversely impacts the ability for a developer of LLVM to change the value of an option during debugging and development. If anything it makes this easier because it provides a way to do so without modifying source code (while not preventing you from doing it by modifying source code).

I guess my point here is that there doesn’t need to be a tradeoff. My proposal doesn’t adversely impact convenience, and supports a wider use case.

I do agree that we shouldn’t gate removing the static initializers on removing the globals, but I also don’t think this is really forcing a core design pattern change. If we can’t come to an agreement on the global data we can shelve the conversation.

-Chris

It won’t no. The majority of static initializers are globals in passes, but cl::opt’s and other globals in tools for example are out of scope for this work right now (there’s no opt.cpp in a dylib so we don’t care about its globals for example).

Oh yeah, its a fine solution for a tricky problem, but now that we’re having this discussion its clear that its use of static initializers is itself a problem.

I can see what you’re saying here, but i’m not convinced that changing the value of a constant in global scope is any easier than changing its value in the pass constructor.

I agree here. There’s not a strict need to move the option storage for something like an unsigned in to a pass using it (there may be for a std::string for which we’d again have a static initializer). However, I do think its the right thing to do in terms of coding guidelines. If the code to get and set an option is already in the pass initializer/constructor respectively, then I don’t think the storage should have to live outside the pass just to minimize a patch.

Now saying this, I don’t think if the community agreed to this proposal as is, that it would mean blanket approval to change all cl::opts in all cases. But I think where its an easy change, and obviously the right choice, that options as well as their storage should be allowed to be moved in to the pass. If it makes sense to move only the option but not the storage then that should be done as a first step, and a discussion started on the right thing for the storage.

Thanks,
Pete

FWIW, I largely agree with Rafael's position here, at least in the near
term.

The nice thing about it is that even if we don't stay there forever, it is a
nice incremental improvement over the current state of the world, and we can
actually be mindful going forward of whether the restriction it imposes (an
inability to use "internal" knobs from within a library context that have
multiple different library users in a single address space) proves to be a
significant on-going burden.

I actually disagree with this for two reasons.

The first is that if there are going to be changes to the code anyway to
remove static initializers, and we can move the storage to the pass at the
same time, the we should make just one change and not two.

The second reason is that in most cases these knobs should not be globals.
If I had a piece of data (not a CL::opt) in global scope, only used by one
pass, then I'm sure people would question why it's a global at all and move
it inside the class. We're treating cl::opt as special here when there's no
reason for the opt or the storage to be global.

We frown upon the use of globals, otherwise LLVM would be littered with them
like many other C++ code bases. I don't think cl::opts should be special at
all in this respect.

Note that the call

cl::OptionRegistry::CreateOption<bool>(&ScalarizeLoadStore,
"ScalarizeLoadStore",
   "scalarize-load-store", cl::Hidden, cl::init(false),
   cl::desc("Allow the scalarizer pass to scalarize loads and store"));

ScalarizeLoadStore can actually be a member variable as long as caller
guarantees it is still around when the command line is parsed. I have
no objections to doing this move in the first pass if you want to.

What I would really like to avoid for now is the

cl::OptionRegistry::GetValue<bool>("ScalarizeLoadStore");

Cheers,
Rafael

FWIW, I largely agree with Rafael's position here, at least in the near
term.

The nice thing about it is that even if we don't stay there forever, it is a
nice incremental improvement over the current state of the world, and we can
actually be mindful going forward of whether the restriction it imposes (an
inability to use "internal" knobs from within a library context that have
multiple different library users in a single address space) proves to be a
significant on-going burden.

I actually disagree with this for two reasons.

The first is that if there are going to be changes to the code anyway to
remove static initializers, and we can move the storage to the pass at the
same time, the we should make just one change and not two.

The second reason is that in most cases these knobs should not be globals.
If I had a piece of data (not a CL::opt) in global scope, only used by one
pass, then I'm sure people would question why it's a global at all and move
it inside the class. We're treating cl::opt as special here when there's no
reason for the opt or the storage to be global.

We frown upon the use of globals, otherwise LLVM would be littered with them
like many other C++ code bases. I don't think cl::opts should be special at
all in this respect.

Note that the call

cl::OptionRegistry::CreateOption<bool>(&ScalarizeLoadStore,
"ScalarizeLoadStore",
  "scalarize-load-store", cl::Hidden, cl::init(false),
  cl::desc("Allow the scalarizer pass to scalarize loads and store"));

ScalarizeLoadStore can actually be a member variable as long as caller
guarantees it is still around when the command line is parsed. I have
no objections to doing this move in the first pass if you want to.

What I would really like to avoid for now is the

cl::OptionRegistry::GetValue<bool>("ScalarizeLoadStore”);

Ah, I totally see what you mean now. Sorry if i had been confused before.

To be honest, I had exactly the same reservations myself, but then i looked in to how we initialize and create passes and there actually isn’t a nice way to do this right now.

The trouble is that INITIALIZE_PASS doesn’t actually construct a pass. It actually constructs a PassInfo which has a pointer to the default constructor of the pass. Later, if the pass manager needs to (say -scalarize was on the commandline), it will get the default constructor from PassInfo and construct the pass.

Unfortunately, this completely detaches the lifetime of the pass instance where we want to store the ScalarizeLoadStore bool, from the option code to hook in to the cl::opt stuff. I originally wanted to do something gross like pass __offset_of(Scalarizer::ScalarizeLoadStore) to the PassInfo and hide the initialization of the option in there. But that doesn’t work either, as there’s no guarantee you’ll even create a instance of Scalarizer, even though you could pass -scalarize-load-store to the command line.

So, we do have 2 solutions: a global variable, and a call to GetValue. As you can see, either way isn’t perfect, but if you can think of another solution i’m sure we can discuss it.

Thanks,
Pete

FWIW, I largely agree with Rafael's position here, at least in the near
term.

The nice thing about it is that even if we don't stay there forever, it is a
nice incremental improvement over the current state of the world, and we can
actually be mindful going forward of whether the restriction it imposes (an
inability to use "internal" knobs from within a library context that have
multiple different library users in a single address space) proves to be a
significant on-going burden.

I actually disagree with this for two reasons.

The first is that if there are going to be changes to the code anyway to
remove static initializers, and we can move the storage to the pass at the
same time, the we should make just one change and not two.

The second reason is that in most cases these knobs should not be globals.
If I had a piece of data (not a CL::opt) in global scope, only used by one
pass, then I'm sure people would question why it's a global at all and move
it inside the class. We're treating cl::opt as special here when there's no
reason for the opt or the storage to be global.

We frown upon the use of globals, otherwise LLVM would be littered with them
like many other C++ code bases. I don't think cl::opts should be special at
all in this respect.

Note that the call

cl::OptionRegistry::CreateOption<bool>(&ScalarizeLoadStore,
"ScalarizeLoadStore",
"scalarize-load-store", cl::Hidden, cl::init(false),
cl::desc("Allow the scalarizer pass to scalarize loads and store"));

ScalarizeLoadStore can actually be a member variable as long as caller
guarantees it is still around when the command line is parsed. I have
no objections to doing this move in the first pass if you want to.

What I would really like to avoid for now is the

cl::OptionRegistry::GetValue<bool>("ScalarizeLoadStore”);

Ah, I totally see what you mean now. Sorry if i had been confused before.

To be honest, I had exactly the same reservations myself, but then i looked in to how we initialize and create passes and there actually isn’t a nice way to do this right now.

The trouble is that INITIALIZE_PASS doesn’t actually construct a pass. It actually constructs a PassInfo which has a pointer to the default constructor of the pass. Later, if the pass manager needs to (say -scalarize was on the commandline), it will get the default constructor from PassInfo and construct the pass.

Unfortunately, this completely detaches the lifetime of the pass instance where we want to store the ScalarizeLoadStore bool, from the option code to hook in to the cl::opt stuff. I originally wanted to do something gross like pass __offset_of(Scalarizer::ScalarizeLoadStore) to the PassInfo and hide the initialization of the option in there. But that doesn’t work either, as there’s no guarantee you’ll even create a instance of Scalarizer, even though you could pass -scalarize-load-store to the command line.

So, we do have 2 solutions: a global variable, and a call to GetValue. As you can see, either way isn’t perfect, but if you can think of another solution i’m sure we can discuss it.

This is very raw, so excuse any mistakes in the code, but I think i came up with a third option.

What if we added a static method to the Scalarizer class. This method takes pointers to each option storage. If a pointer is null, the function is being called from INITIALIZE_PASS and so we create all the options. If a pointer is not null, we’re being called from the pass constructor and we set the value of that option. I think it would look something like this (which can of course be tidied up).

  static void addOptions(bool *ScalarizeLoadStore = nullptr) {
    Option::iterator ScalarizeLoadStoreOpt =
      getOrAddOption<bool>("scalarize-load-store");
    if (ScalarizeLoadStore) {
      // Get the value of the option.
      *ScalarizeLoadStore = ScalarizeLoadStoreOpt.getValue();
    } else {
      // Adding the option with this name. Set up its properties.
      ScalarizeLoadStoreOpt.init(cl::Hidden, cl::init(false),
                                 cl::desc("Allow the scalarizer pass to scalarize loads and store"));
    }
  }

Pete

So, we do have 2 solutions: a global variable, and a call to GetValue. As you can see, either way isn’t perfect, but if you can think of another solution i’m sure we can discuss it.

Pass the passinfo to the pass constructor maybe?

I still don't understand what the problem with the global is. It has
the same value for all users. As Chandler pointed out, having

static cl::opt<bool> ScalarizeLoadStore
  ("scalarize-load-store", cl::Hidden, cl::init(false),
   cl::desc("Allow the scalarizer pass to scalarize loads and store"));

right now is just our way of making

// Allow the scalarizer pass to scalarize loads and store
const static bool ScalarizeLoadStore = false;

more convenient to developers so they don't have to edit/compile to
test it with a different value. From a library user point of view they
should be exactly the same: a constant that is *always* false.

Another option (not my preference) would be to use globals just as keys:

typedef <something> LLVMOptionKey;
...
static LLVMOptionKey ScalarizeLoadStoreKey; //global
...
cl::OptionRegistry::createOption<bool>(&ScalarizeLoadStoreKey,
"ScalarizeLoadStore",
   "scalarize-load-store", cl::Hidden, cl::init(false),
   cl::desc("Allow the scalarizer pass to scalarize loads and store"));
....
bool ScalarizeLoadStore =
cl::OptionRegistry::getValue<bool>(&ScalarizeLoadStoreKey); // local

That way we avoid exposing ScalarizeLoadStore to library users since
the getValue takes a key they cannot guess instead of well known
string.

Cheers,
Rafael

This is very raw, so excuse any mistakes in the code, but I think i came up with a third option.

What if we added a static method to the Scalarizer class. This method takes pointers to each option storage. If a pointer is null, the function is being called from INITIALIZE_PASS and so we create all the options. If a pointer is not null, we’re being called from the pass constructor and we set the value of that option. I think it would look something like this (which can of course be tidied up).

  static void addOptions(bool *ScalarizeLoadStore = nullptr) {
    Option::iterator ScalarizeLoadStoreOpt =
      getOrAddOption<bool>("scalarize-load-store");
    if (ScalarizeLoadStore) {
      // Get the value of the option.
      *ScalarizeLoadStore = ScalarizeLoadStoreOpt.getValue();
    } else {
      // Adding the option with this name. Set up its properties.
      ScalarizeLoadStoreOpt.init(cl::Hidden, cl::init(false),
                                 cl::desc("Allow the scalarizer pass to scalarize loads and store"));
    }
  }

As long as we are careful to make sure only the command line parsing
can set this, it should be fine. It has some nice properties missing
from the original proposal (including stage2)

* "scalarize-load-store" is written once. Not extra string keys.
* It seems easier to hide it.
* We don't need a LLVMConfig object that gets passed around.
* There is exactly one of exposing features to libraries: change the
constructor and maybe the PassManagerBuilder.

I still don't see the advantage over a static storage, but I am not
strongly oppose to it, as long as there is nothing like
cl::OptionRegistry::SetValue<bool>("ScalarizeLoadStore", true) that
libraries can use.

Cheers,
Rafael

So, we do have 2 solutions: a global variable, and a call to GetValue. As you can see, either way isn’t perfect, but if you can think of another solution i’m sure we can discuss it.

Pass the passinfo to the pass constructor maybe?

I still don't understand what the problem with the global is. It has
the same value for all users. As Chandler pointed out, having

I just assumed that all globals are bad. Perhaps thats not true in all cases.

static cl::opt<bool> ScalarizeLoadStore
("scalarize-load-store", cl::Hidden, cl::init(false),
  cl::desc("Allow the scalarizer pass to scalarize loads and store"));

right now is just our way of making

// Allow the scalarizer pass to scalarize loads and store
const static bool ScalarizeLoadStore = false;

more convenient to developers so they don't have to edit/compile to
test it with a different value. From a library user point of view they
should be exactly the same: a constant that is *always* false.

Another option (not my preference) would be to use globals just as keys:

typedef <something> LLVMOptionKey;
...
static LLVMOptionKey ScalarizeLoadStoreKey; //global

You could do this if the key is the name of the option, but that detaches the name from the rest of the calls which may be quite far apart in the code. Otherwise so long as the key type doesn’t have a static constructor, this is fine with me.

Thanks,
Pete

Another option (not my preference) would be to use globals just as keys:

typedef <something> LLVMOptionKey;
...
static LLVMOptionKey ScalarizeLoadStoreKey; //global

You could do this if the key is the name of the option, but that detaches the name from the rest of the calls which may be quite far apart in the code. Otherwise so long as the key type doesn’t have a static constructor, this is fine with me.

The Key global is only used for its unique address, so LLVMOptionKey
could be just "void *" for example.

Cheers,
Rafael

I think this statement ignores important debugging use cases, and over-simplifies the model of how LLVM-based frameworks are often developed. As an example, say I develop an LLVM backend for a custom co-processor, and the compiler for it runs inside the software stack of the driver for my co-processor. Per what you and Rafael are saying, the driver should not be able to programmatically set command line options to LLVM. But this ignores the reality that, when developing my backend, I may still need to be able to twiddle debugging options when debugging a live driver stack.

Fundamentally, you argument is rooted in a world where LLVM debugging is always done on the llc or clang binaries, where the LLVM developer can easily modify the command line arguments themselves. That’s not true of many use cases where LLVM as a shared library is relevant.

—Owen

So, we do have 2 solutions: a global variable, and a call to GetValue. As you can see, either way isn’t perfect, but if you can think of another solution i’m sure we can discuss it.

Pass the passinfo to the pass constructor maybe?

I still don't understand what the problem with the global is. It has
the same value for all users. As Chandler pointed out, having

Globals are bad for many reasons: namespace pollution, concurrency issues, lack of access control, etc. etc.. Some of them have been discussed in this thread. Perhaps it’s not a big concern for most of the LLVM users. But we have an unique environment where LLVM is shared by multiple clients, and where the concern around exploits are especially strong. So while eliminating globals is not strictly tied to the elimination of static initializers, it is still a strong goal towards providing a LLVM dylib.

Evan