[cxxabi] Thread-safe statics causing deadlocks

C++14 6.7 Declaration statement, clause 4 has the standardese for “Magic” / thread-safe statics. Footnote 91 says “The implementation must not introduce any deadlock around execution of the initializer.” I believe this is unimplementable. The standard (and users) require mutual exclusion (though not necessarily a mutex) to be provided over unknown / arbitrary code. This causes well known problems ( ).

Libcxxabi, libsupc++, and the Microsoft implementation all have deadlocks in released compilers. I have two examples, one heavily contrived, and the other lightly contrived.

Heavily contrived example:
In the following code, the static A2 can cause the static B2 to be constructed, and B2 can cause the static A2 to be constructed. A bool is passed along to prevent recursion. This leads to the classic “deadly embrace”, where each thread is waiting for a resource from the other thread to be released. The sleeps have been added to make the race condition more likely to trigger. No user data is racing in this example. There would be a hidden data race on the “is initialized” flag on each of the statics, except that that is one of the races that thread-safe statics is supposed to fix.

#include <thread>

using namespace std::chrono_literals;

void aMaker(bool MakeB);
void bMaker(bool MakeA);

struct SlowA {
explicit SlowA(bool MakeB) {
std::this_thread::sleep_for(2s);
if(MakeB) bMaker(false);
}
};

struct FastB {
explicit FastB(bool MakeA) {
if(MakeA) aMaker(false);
}
};

void aMaker(bool MakeB) { static SlowA A2(MakeB); };
void bMaker(bool MakeA) { static FastB B2(MakeA); };

int main() {
std::thread first( []{aMaker(true);});
std::this_thread::sleep_for(1s);
std::thread second([]{bMaker(true);});
``
first.join();
second.join();
}

Lightly contrived example:
In the following code, we cause a deadlock with only one user defined recursive mutex. I think this issue could actually affect real code bases, though I haven’t hit the problem myself.

#include <thread>
#include <mutex>

std::recursive_mutex g_mutex;

struct SlowA {
explicit SlowA() {
std::lock_guard<std::recursive_mutex> guard(g_mutex);
}
};

void aMaker() {
static SlowA A2;
};

int main() {
using namespace std::chrono_literals;
std::thread first([]{
std::lock_guard<std::recursive_mutex> guard(g_mutex);
std::this_thread::sleep_for(2s);
aMaker();
});
std::this_thread::sleep_for(1s);
std::thread second([]{ aMaker(); });

first.join();
second.join();
}

I’m not sure what should be done. Removing the lock protections would be terrible. Banning the use of locks in functions that construct statics would be terrible. Banning the use of locks in functions called from static construction would be terrible. It would be embarrassing to change the footnote in the standard to say that the language is permitted (even required) to introduce deadlocks.

Objective-C has a similar problem with the atomicity guarantees of +initialize. The GCC runtime implementation held a global lock and ensured that only one thread could be running *any* initialiser at once. The Apple and GNUstep implementations hold a per-class lock, which is far more prone to deadlock for precisely the same reasons as the C++ equivalent, yet yields a sufficiently large speedup for well-written code that users prefer it.

I believe that the text of the standard is fine, however. The implementation does not introduce deadlocks, the user code introduces deadlocks. When running code in a synchronised context, you must ensure that you perform the same lock-order checks as when running in any other synchronised context. If users write code that contains deadlocks, then they should get deadlocks.

David

The code snippets provided run without deadlocks if thread-safe statics are disabled. There are also no user-level data races. The standard seems pretty clear to me that this behavior isn't allowed.

I think that C++ code is better with the per-object locks in place, but the standard is unclear on what is considered allowable behavior. I believe the standard permits a global lock, or a per-object lock, or something in between. The implementation choice affects what is valid user code. My "highly contrived" example wouldn't deadlock with a single global lock implementation, but the "lightly contrived" example will.

C++14 6.7 Declaration statement, clause 4 has the standardese for "Magic"
/ thread-safe statics. Footnote 91 says "The implementation must not
introduce any deadlock around execution of the initializer." I believe
this is unimplementable.

While your interpretation is not unreasonable, I believe you've
misunderstood the meaning and intent of this footnote. Note that it says
*the implementation* must not introduce any deadlock -- that is, there must
not be any deadlock that is not implied by the program semantics. The
normative sentence preceding this footnote says "If control enters the
declaration concurrently while the variable is being initialized, the
concurrent execution shall wait for completion of the initialization." The
potential for deadlock in that rule is not affected by the presence of this
footnote, because that's deadlock introduced by the language semantics, not
deadlock introduced by the implementation.

To understand the purpose of this footnote, you need to look at how GCC 3.x
implemented thread-safe local statics (prior to standardization). They had
a single, global, recursive mutex protecting all local static
initialization. This results in deadlock *introduced by the implementation*
if a static local variable's initializer spawns and joins a thread, and
that thread triggers initialization of a different static local variable.
It is specifically that implementation strategy which is being called out
as non-conforming here.

The standard (and users) require mutual exclusion (though not necessarily a

Thanks for the clear explanation. Would it be possible to tack on a few more words onto that footnote to clear that up? My mistake reading of that note was basically “No deadlocks allowed”, when my reading of the text should have been “No deadlocks allowed beyond the ones we just mandated”.

Perhaps…
“The implementation must not introduce any deadlocks around execution of the initializer that are not implied by the program semantics”.
Maybe the redundancy is against the C++ standards style, but it would make it a bit more obvious to readers of the spec that some deadlocks are not only possible, but required.

I have seen deadlocks being reported on OSX.
Consider this example

#include <thread>
#include <string>
#include <mutex>

using namespace std::chrono_literals;
std::mutex mtx;

int main() {
  // This initializes a static under the mutex
  std::thread t1([]{
    std::lock_guard<std::mutex> lock(mtx);
    std::this_thread::sleep_for(100ms);
    static std::string goo = "goo";
  });

  // This thread initializes another static whose ctor locks the same mutex
  std::thread t2([]{
    struct str {
      str() {
        std::this_thread::sleep_for(100ms);
        std::lock_guard<std::mutex> lock(mtx);
      }
    };
    static str s;
  });
  t1.join(); t2.join();
}

Apple's implementation[1] has a bug which would make this code deadlock as the
same mutex is used and kept locked while initializing both statics. That's a
classical deadlock with two mutexes locked in different order.
I believe this kind of deadlock is what the standard forbid.

I don't have OSX at hand so I could not verify this is still accurate. Please
someone correct me if i'm wrong.

[1]
http://opensource.apple.com/source/libcppabi/libcppabi-26/src/cxa_guard.cxx

Perhaps we could simply remove the footnote, if it's creating confusion
rather than removing it. If you file an editorial issue at
github.com/cplusplus/draft/issues, I or one of the other maintainers of the
C++ draft will look into rewording or removing this. Thanks!

https://github.com/cplusplus/draft/issues/849