[RFC] [tsan] Implementing a Fuzz Scheduler for TSAN

Background

Prior to TSAN, @dvyukov worked on a tool called Relacy. Relacy is a race detector runtime. Replacing threads with fibers, it combines both memory ordering tracking and a fine grained scheduler which interleaves the “threads” in as many possible interleavings as possible. It supports random scheduling, as well as a “full search” scheduler which attempts to schedule every possible interleaving of threads, assuming the unit test is small enough.

One of Relacy’s main drawbacks is that it often requires heavy modification to the source code under test, replacing standard includes with its own set of definitions of standard types like the pthread_* APIs.

One of TSAN’s biggest advantages is that it works for a wide variety of programs out of the box. My goal would be to pair one of Relacy’s strengths - antagonistic scheduling - into TSAN, to further its ability to catch race conditions.

Scheduling related bugs

TSAN can only detect races if the threads actually hit a particular interleaving. Since TSAN itself does not provide its own scheduler, relying on the OS/system scheduler, some races may rarely or never be caught by TSAN without significant luck in how the OS schedules the threads.

One example is a race I found in stdexec that despite tens of thousands of iterations, TSAN could never find on my machine. Some details are in Reproduce split race/crash by ccotter · Pull Request #1395 · NVIDIA/stdexec · GitHub. rare_ref.cpp · GitHub is a minimized version that simulates the same race, which TSAN is also not able to find without running the loop thousands of times. In most of the observed thread interleavings by the OS on my machine, the condition that allowed for the race to manifest did not occur.

Prior work

https://ceur-ws.org/Vol-2344/paper9.pdf describes a scheduler which intercepts each “sync point” in the program (mutex/cv interaction, or atomic operation). The threads cooperate to only allow one thread to actually execute at a time, with the other threads busy waiting until a different thread is selected to execute. When the active thread hit the next “sync point,” it would choose to elect to unblock another thread, then pause itself if it was not selected to run. Different scheduling algorithms supported could be random, or a “full search” schedule.

https://groups.google.com/g/thread-sanitizer/c/c32fnH0QQe4 proposed adding such a scheduler into the TSAN runtime, with ⚙ D65383 Proposals for implementing a scheduler in TSAN offering an initial version of this idea (although it was never merged). This Phab added support for a randomized scheduler.

Adding the random scheduler to TSAN

I’ve applied the above Phab into a local copy of the TSAN runtime, and I have some promising results. The “random scheduler” runtime catches the following bugs that the unmodified TSAN runtime is not able to catch as easily. My machine is a virtual 32core Linux environment.

  • stdexec race. I never observed the unmodified TSAN able to catch this bug. With the random scheduler runtime, TSAN reliably reports the bug when running the test in a loop of 100 iterations.
  • rare_ref.cpp · GitHub, which is the minimized version of the stdexec bug, only reports the race after running thousands of iterations. The random scheduler often reports the bug in only 10 iterations of the program.
  • Two previously unreported bugs in bdlcc_singleconsumerqueueimpl.h, both of which are now tracked in our company’s internal tracking tool. One data race due to a missing release-acquire, and another race caught by a test assertion that was only caught due to the random scheduler able to suspend a producer thread “at just the right moment” while the consumer thread observed a corner case, leading to consuming a value that was never produced.
  • I am able to reproduce the similar test results of the “Rare case” under section 6 in https://ceur-ws.org/Vol-2344/paper9.pdf. The random scheduler reliably produces different values from the for loop, whereas the default OS scheduler almost always produced the same value since the first thread executed to completion before the second thread even began.

Proposed changes to the TSAN runtime

I propose that we add changes based on ⚙ D65383 Proposals for implementing a scheduler in TSAN into TSAN. Namely, support for a scheduler with the default scheduler being a “no op” scheduler that acts as TSAN acts today, fully delegating to the OS scheduler. The scheduler would be run on each “sync point” in the program.

The first non-“no op” scheduler added would be the random scheduler. It would

  • Track running threads, and only allow one thread at a time to return control back from a “sync point.”
  • Intercept pthread mutex and condition variable calls to track which threads are runnable or blocked. The runtime would implement the waiting of blocked threads and wake up other threads when a mutex is unlocked, or a condition variable is notified.
  • Implement a minimal “event loop” to properly implement the sleep interceptor, and lay the foundations for supporting IO syscalls (so that the runtime could properly track which threads are runnable, and which ones are blocked on IO and should not be selected by the random scheduler).
  • Run a background watchdog thread to timeout running threads and schedule another thread to ensure progress.

The watchdog would be required for code that relies on conditions to be met that are not observed by the TSAN interceptors, or are not yet supported in the scheduler runtime by the interceptors. For instance, we may not teach the scheduler about epoll on the first implementation, meaning the watchdog might be necessary to wake up other threads to perform an IO operation to satisfy the epoll to return.

In the first implementation, as a matter of limiting scope and complexity, I would propose to not explicitly teach the scheduler about IO syscalls, and rely on the watchdog thread to make progress in programs that rely on IO syscalls. This would mean the random scheduler would be less useful for programs that interact with IO (but still work correctly), but hopefully extremely beneficial for unit testing concurrent data structures and algorithms.

1 Like

Hi Chris,

I think overall this is a good idea and should happen sooner or later.

The most (and the only) usable scheduler for us would be a “bounded overhead” scheduler that aims at a relatively small constant slowdown. For example, 10-20% at most (should be tunable). It also excludes complete serialization to a single thread.

Some implementation notes:

  1. The interception points should have inlined fast path for when scheduling is not enabled (if (!enabled) return; in a header).
  2. Marco Elver suggested a good idea: ideally it should be usable for asan/msan as well. Namely, live in sanitizer_common, be controlled with common flags. Initially only tsan could call these hooks, but it should be possible to add these to asan in future as well.

Thanks for the reply Dmitry.

"bounded overhead” scheduler that aims at a relatively small constant slowdown

So, do you mean a scheduler which allows all threads to run as normal, except that each “sync point” introduces an artificial randomized sleep (nothing like that is mentioned in https://ceur-ws.org/Vol-2344/paper9.pdf)?

I can see value in something like this. Let me implement this and see how it performs with my unit tests.

What has been successful for me is the Random algorithm that allows only one thread to run at once as described in Doronin et al. I was envisioning two (more in the future) schedulers then: a randomized scheduler that allows only one thread at a time to run, and the second being what you just described (if I understood it correctly).

  1. The interception points should have inlined fast path for when scheduling is not enabled (if (!enabled) return; in a header).

Basing my work off of what was done in ⚙ D65383 Proposals for implementing a scheduler in TSAN, I kept the virtual inheritance design, with the default (aka, “scheduler not enabled”) derived type as a “no op” implementation. I can move the “getter” to a header file, but would you expect for the virtual pointer indirection to be too much overhead (need I profile both options before proceeding with one or the other)? With the general TSAN slowdown of 5-15x, I can’t imagine a virtual pointer indirection here would be meaningfully slower.

Marco Elver suggested a good idea:

I like it!

So, do you mean a scheduler which allows all threads to run as normal, except that each “sync point” introduces an artificial randomized sleep

Yes. And not necessary each, otherwise it may be too slow.
That’s the only scheduler that will be usable for large real programs. It’s fine if it does not detect bugs your small unit tests.

I can move the “getter” to a header file

Yes, please.

but would you expect for the virtual pointer indirection to be too much overhead (need I profile both options before proceeding with one or the other)? With the general TSAN slowdown of 5-15x, I can’t imagine a virtual pointer indirection here would be meaningfully slower.

You probably measure on synthetic synchronization primitive unit tests.
What we see on real large programs that do real work is around ~2-3x (highly depends on actual program).
For something like sem_post, yes, virtual call will probably be non-measurable. But atomic loads can be very frequent, and cost almost 0 in normal non-tsan build, so frequently they are sprinkled everywhere and in loops.