Use of Vectored Exception Handlers for crash recovery

As part of my effort to replace our hand-rolled mutexes with std::mutex and std::recursive_mutex, I found that many of the tests were failing on Windows after doing the replacement. One of the reasons was due to the use of a mutex in lib\Support\CrashRecoveryContext.cpp. If this mutex is an std::recursive_mutex or std::mutex, tests fail. If this mutex is a windows CRITICAL_SECTION, the tests pass.

I think, but am not 100% sure, that this is due to an interaction between MSVC’s mutex implementation and the use of vectored exception handling.

There is a comment in this file that says the following:

// On Windows, we can make use of vectored exception handling to
// catch most crashing situations. Note that this does mean
// we will be alerted of exceptions before structured exception
// handling has the opportunity to catch it. But that isn’t likely
// to cause problems because nowhere in the project is SEH being
// used.

However, my understanding is that the handler specified to AddVectoredExceptionHandler will get called for EVERYTHING. In particular, since we install the handler at the front of the list, if some internal Windows library wrote something like this:

__try {
} except (…) {
// handle exception

Then our handler would get called before the Windows handler. Furthermore, we seem to assume inside the handler that it’s impossible to reach the handler unless there was an actual crash scenario. Is this necessarily true?

It’s worth pointing out that std::[recursive_]mutex on MSVC are implemented using ConcRT (Microsoft Concurrency Runtime), which is all sorts of crazy (even has its own user mode scheduler [1]), so I think the user of a vectored exception handler is not safe with ConcRT.

Any other windows experts have any thoughts into what might be going on here?

[1] -

I think this functionality mostly exists to support applications that use clang as a library and want to be able to recover from crashes. For example, XCode. Clang’s Tooling library also uses this. I don’t think anyone has been doing this on Windows with any seriousness yet, so I wouldn’t be surprised if it isn’t robust.

Honestly, it’s crash recovery. There’s no way it can be robust. We already lost, possibly due to a use-after-free bug. None of this code looks signal safe anyway. =P I’d do the simplest thing that works.

It’s funny, because now I’m not even sure that’s it. Or maybe it was part of it, but not everything. I managed to trigger a test failure even when that mutex was a critical section. I’m reaching the point of deciding that MSVC’s mutex implementation is either buggy, or we’re doing something that’s against the standard. Either way, I’ll hammer on it some more before throwing in the towel.