call_once and TSan

Hi,

I'm trying to write a TSan interceptor for the C++11 call_once function. There are currently false positive reports, because the inner __call_once function is located in the (non-instrumented) libcxx library, and on macOS we can't expect the users to build their own instrumented libcxx.

TSan already supports pthread_once and dispatch_once by having interceptors that re-implement the logic. However, doing the same for call_once/__call_once doesn't work, because call_once is explicitly supposed to be exception-safe, but the sanitizer runtime libraries disallow exception handling.

Any ideas how to handle call_once in TSan?

Thanks!
Kuba

Does anybody remember exact reasons we disable exceptions in sanitizer
runtimes? One is that it won't link with C programs. Are there any
other?
If C is the only reason: there is already a part of tsan runtime is
linked only to C++ programs (it contains operator new/delete
interceptors). We could add additional files to the cxx part of
runtime and build them with exceptions.

Alternatively, the interceptor can handle only synchronization but
forward actual logic to the real function. Along the lines of:

INTERCEPTOR(call_once, o) {
  __tsan_acquire_release(o);
  REAL(call_once)(o);
}

That will have some performance impact. If we hardcode the "fully
initialized" value, then we can eliminate the additional overhead:

INTERCEPTOR(call_once, o) {
  if (__atomic_load(o, acquire) == FULLY_INITIALIZED) {
     __tsan_acquire(o);
     return;
  }
  __tsan_acquire_release(o);
  REAL(call_once)(o);
}

Unfortunately, the first fast-path check is inlined and cannot be intercepted. We can only intercept the inner call to __call_once. But how would __tsan_acquire_release help here? The issue is that we need to perform the release *after* user code has run, but before the "o" flag is changed. Otherwise, TSan will still see a false positive where one thread has already run user code, and another thread already sees that call_once is finished, but the acquire has no release to pair with.

Kuba

Will then the following work?

INTERCEPTOR(call_once, o) {
  REAL(call_once)(o);
  __tsan_release_merge(o);
  __tsan_acquire(o);
}

Still racy. Suppose thread A is still inside REAL(call_once), but already after it has run user code and updated "o" to ~0. Thread B load-acquires "o", finds ~0, assumes it's fully initialized, keeps going, but user code stores hasn't been properly published.

Kuba

INTERCEPTOR(call_once, o) {
    __tsan_release_merge(o);
   REAL(call_once)(o);
  __tsan_acquire(o);
}

?

Same problem exists, thread A can still be within REAL(call_once), but after it ran user code and set the flag to ~0. Roughly, call_once does:

__call_once(flag, arg, func) {
  mutex_lock(mut);
  if (flag == BEING_INITIALIZED) { wait }
  else if (flag == NOT_INITIALIZED_AT_ALL) {
    flag = BEING_INITIALIZED;
    mutex_unlock(mut);
    func(arg); // <=== user code callback
    mutex_lock(mut);
    atomic_store(&flag, FULLY_INITIALIZED, mo_release); // "release" store, but within a compiled dylib, thus invisible to TSan
  }
  mutex_unlock(mut);
}

If thread A is just after the release store, which is invisible to TSan, __tsan_acquire in thread B will have no effect, and the stores from the callback to func(arg) will not be synchronized to thread B.

Anyway, I just realized that we can wrap "func" into our own callback, which will perform the (extra) __tsan_release... Do you think that would work? E.g.:

void call_once_callback_wrapper(...) {
  orig_func(orig_arg);
  __tsan_release(flag);
}

INTERCEPTOR(call_once, o, func, arg) {
  REAL(call_once)(flag, ..., call_once_callback_wrapper);
}

Kuba

Same problem exists, thread A can still be within REAL(call_once), but after it ran user code and set the flag to ~0. Roughly, call_once does:

__call_once(flag, arg, func) {
  mutex_lock(mut);
  if (flag == BEING_INITIALIZED) { wait }
  else if (flag == NOT_INITIALIZED_AT_ALL) {
    flag = BEING_INITIALIZED;
    mutex_unlock(mut);
    func(arg); // <=== user code callback
    mutex_lock(mut);
    atomic_store(&flag, FULLY_INITIALIZED, mo_release); // "release" store, but within a compiled dylib, thus invisible to TSan
  }
  mutex_unlock(mut);
}

If thread A is just after the release store, which is invisible to TSan, __tsan_acquire in thread B will have no effect, and the stores from the callback to func(arg) will not be synchronized to thread B.

right

Anyway, I just realized that we can wrap "func" into our own callback, which will perform the (extra) __tsan_release... Do you think that would work? E.g.:

void call_once_callback_wrapper(...) {
  orig_func(orig_arg);
  __tsan_release(flag);
}

INTERCEPTOR(call_once, o, func, arg) {
  REAL(call_once)(flag, ..., call_once_callback_wrapper);
}

this should work
but I am obviously bad at thinking today

Lol okay, I'll try to implement that. I'm still curious if we can allow C++ exception handling in TSan RTL, because then we could reimplement call_once in the wrapper in a way that it would also support already-compiled and non-instrumented user code.

Kuba