system_error in shared_future::wait_for

Hello everyone,

I am new to this list. I am writing because I think I might have caught a bug in libc++. What I hope to get is a quick assessment of my issue so that I know if I should file this as a bug or some hints in how I should proceed to gather more information or even rectify the issue on my end.

The issue I am seeing popped up in a macOS project (Xcode 10.2.1) I am working on but I was able to condense it into a very small test program, see below for the code.

The issue presents itself as a crash in shared_future::wait_for due to an uncatchable exception. The crash does not occur immediately but only after a (huge) number of calls. Console output in Xcode reads "libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: condition_variable timed_wait failed: Invalid argument“. See backtrace further down for details. Unfortunately I cannot step into code of the method where the exception originates which is condition_variable::__do_timed_wait. I browsed the code in the libcxx repository and found that the error must be the result code EINVAL from a call to pthread_cond_timedwait.

The interesting bit is that the issue can be reproduced in each test run within (mostly) less than 10 minutes on an iMac18,3/i7 and on a MacBookPro15,1/i7 but never on a MacBookPro15,2/i5. I only have one machine of each of those available so I cannot be sure how these results hold up.

  1. Is there something fundamentally wrong with my code, i.e. how I use the shared_future?

  2. Is this likely to be a bug in libc++ or is it more likely to be an issue with the BSD level API and/or the hardware? In case of the latter option where should I seek contact?

  3. I would like to be able to step into the code for condition_variable::__do_timed_wait and get the debugger info for the local variables. Would this be simply a matter of pulling the libcxx repository, building and linking it? I am not a command line compilation guy so I was hoping for some good documentation on how to do this with and for Xcode. However, if must be, I am willing to accept my fate.

Thanks,
Benjamin

Notes on the code below:

  • The production code does not call sharedFuture.wait_for with 100ns. The production code currently uses 1ms which makes the issue pop up much less often, I saw half a day and three days, for example. 100ns has proven to trigger the exception much more often, that’s all.
  • FWIW, the same exception is thrown with shared_future::wait_until(steadyClockTimePoint), so steady clock accuracy does not seem to help.

In order to do that with 100% reproduceability, you’d have to build libc++ exactly the same way it was built for the platform you’re running on, which is not easily achievable at the moment. I’ll take a look.

I’ll try running this program overnight to reproduce:

cat <<EOF | clang++ -xc++ - -std=c++17 && ./a.out
#include
#include

using namespace std;

int main(int argc, const char * argv[]) {
promise thePromise = promise {};
shared_future sharedFuture = thePromise.get_future().share();

thread anotherThread = thread( [sharedFuture]
{
int debugCount = 0;
while (sharedFuture.wait_for(100ns) == future_status::timeout)
{
debugCount++;
}
});

sharedFuture.wait();

return 0;
}
EOF

However, it would be useful for me to know the OSes you’ve been able to reproduce it on.

Thanks,
Louis

Louis,

However, it would be useful for me to know the OSes you’ve been able to reproduce it on.

All machines I ran this on are macOS 10.14.4 (18E226). I also tested in a virtualized 10.14.2 on the iMac which also crashed.

The aspect might be circumstantial but in the meantime I was able to trigger the exception also on the MacBookPro15,2/i5, although never with a debug build, only with a release config build (via Xcode Archive). I never tested a debug build on the MacBookPro15,1/i7. On the iMac18,3/i7 I get the exception with both build configs.

Thanks again,

Benjamin

Just to follow up on this, I managed to reproduce without libc++ concurrency primitives (but still using std::chrono), and I’ve followed up with our OS folks to help me figure out what’s going on. I just wanted to let you know I’ve acknowledged the problem and we’re working on it – I don’t think it’s just a misusage of the facilities.

Louis

Thank you for the update. Is there some place where this issue can be tracked? Should I file it as a bug with the LLVM bugzilla?

Benjamin

You should be able to report a bug here: https://developer.apple.com/bug-reporting/. We’ll link it with the bug report I already created internally.

Note that the LLVM bugzilla is not the right place for this, since I think it is most likely a problem pertaining to Apple platforms, and not to a project under the LLVM umbrella (although that remains to be confirmed).

Louis

FYI, I filed a radar: 51160645

Benjamin