Use of statics and ManagedStatics in LLVM

Based on a recent discussion[1], I started trying to remove the functions llvm_start_multithreaded() and llvm_stop_multithreaded() from the codebase. It turns out this is a little bit tricky. Consider the following scenario:

During program initialization, a global static object’s constructor dereferences a ManagedStatic. During dereferencing of the ManagedStatic, it needs to know whether or not to acquire the global lock in order to allocate the ManagedStatic

There are 3 possible types for the global_lock, and none of them solve the problem.

  1. If global_lock is simply another global static, it may not have been constructed yet.

  2. If global_lock is a raw pointer to a mutex, it would have to be explicitly allocated, and we can’t guarantee this during static initialization.

  3. If global_lock is a ManagedStatic, then it will get into an infinite recursion here when trying to allocate this ManagedStatic.

I actually started to feel this way since the first time I started looking at LLVM, but even moreso increasingly I feel that the solution is that ManagedStatics should not be allowed to be accessed until after main begins. llvm_shutdown() gives deterministic order of destroying managed statics, but we don’t have deterministic order of creation of those ManagedStatics. I have a patch up[2] (still awaiting review) that shows a possible solution to these problems and how we might migrate the existing cases where this happens over to a ManagedStatic free static initialization.

There is only one requirement: You must be able to insert a call very early in main() that will do the ManagedStatic initialization. However, we can catch this with an assertion and all anyone would have to do is add one line to their main function. As in [2], all this early main() code would do is copy fields over from one structure to another.

As a side benefit, this provides the ability to “resurrect” after an llvm_shutdown(), shoudl that be desired, because the initial static state of the program always remains in-tact once main() is entered.

Thoughts? Better ideas?

[1] - http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-June/073543.html

[2] - http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140602/219684.html

Based on a recent discussion[1], I started trying to remove the functions
llvm_start_multithreaded() and llvm_stop_multithreaded() from the codebase.
It turns out this is a little bit tricky. Consider the following scenario:

During program initialization, a global static object's constructor
dereferences a ManagedStatic. During dereferencing of the ManagedStatic, it
needs to know whether or not to acquire the global lock in order to allocate
the ManagedStatic

There are 3 possible types for the global_lock, and none of them solve the
problem.

1) If global_lock is simply another global static, it may not have been
constructed yet.

I sort of assumed there was a constexpr (or otherwise
link-time-initialized sort of thing) mutex? Though I recall someone
complaining that this wasn't available on some platform (ARM?
Windows?) or another.

2) If global_lock is a raw pointer to a mutex, it would have to be
explicitly allocated, and we can't guarantee this during static
initialization.

Can't guarantee that the memory allocation will succeed? But it could
be done without a memory allocation - by placement new-ing into
existing memory.

Though, alternatively, something I've done in the past to avoid the
need for mutexes to initialized static state is to use something like
ManagedStatic, except with a global ctor - so it's lazy, initializes
on first use (thwarting global ctor ordering, unless there's a true
circular dependency), but with the added protection that it will be
initialized (one way or another) by a static ctor. This way, if we can
safely assume that global ctors run single threaded (?), we don't have
to worry about racing access to these things later on.

This doesn't help with resurrection that you mention, since after a
shutdown you'd be left in the unitialized state, so if you went right
back into LLVM you would be getting possibly racy initializations -
and then you'd want that "InitLLVM" function you were alluding to.

All that said, I know a bunch of people want to get rid of static
ctors in LLVM anyway - so if there's a path forward that takes us
closer to that goal, I expect it'll get some support.

It's more like you have nowhere to explicitly allocate it to guarantee that
it will have been allocated by the time the other static object is trying
to use it. You can null-check it and lazily allocate, but this same
null-check will be executed after main() enters, every time the
ManagedStatic is accessed, and in that case it will be racy against
whenever you shutdown and free the mutex. Or, if someone did a refactor
and removed the last instance of a ManagedStatic access from global static
constructor, then the first access to any ManagedStatic would be racy.

Right - see my later comment about ensuring the mutex itself is lazily
"last chance" initialized by a global ctor of its own.