Background
Prior to TSAN, @dvyukov worked on a tool called Relacy. Relacy is a race detector runtime. Replacing threads with fibers, it combines both memory ordering tracking and a fine grained scheduler which interleaves the “threads” in as many possible interleavings as possible. It supports random scheduling, as well as a “full search” scheduler which attempts to schedule every possible interleaving of threads, assuming the unit test is small enough.
One of Relacy’s main drawbacks is that it often requires heavy modification to the source code under test, replacing standard includes with its own set of definitions of standard types like the pthread_* APIs.
One of TSAN’s biggest advantages is that it works for a wide variety of programs out of the box. My goal would be to pair one of Relacy’s strengths - antagonistic scheduling - into TSAN, to further its ability to catch race conditions.
Scheduling related bugs
TSAN can only detect races if the threads actually hit a particular interleaving. Since TSAN itself does not provide its own scheduler, relying on the OS/system scheduler, some races may rarely or never be caught by TSAN without significant luck in how the OS schedules the threads.
One example is a race I found in stdexec that despite tens of thousands of iterations, TSAN could never find on my machine. Some details are in Reproduce split race/crash by ccotter · Pull Request #1395 · NVIDIA/stdexec · GitHub. rare_ref.cpp · GitHub is a minimized version that simulates the same race, which TSAN is also not able to find without running the loop thousands of times. In most of the observed thread interleavings by the OS on my machine, the condition that allowed for the race to manifest did not occur.
Prior work
https://ceur-ws.org/Vol-2344/paper9.pdf describes a scheduler which intercepts each “sync point” in the program (mutex/cv interaction, or atomic operation). The threads cooperate to only allow one thread to actually execute at a time, with the other threads busy waiting until a different thread is selected to execute. When the active thread hit the next “sync point,” it would choose to elect to unblock another thread, then pause itself if it was not selected to run. Different scheduling algorithms supported could be random, or a “full search” schedule.
https://groups.google.com/g/thread-sanitizer/c/c32fnH0QQe4 proposed adding such a scheduler into the TSAN runtime, with ⚙ D65383 Proposals for implementing a scheduler in TSAN offering an initial version of this idea (although it was never merged). This Phab added support for a randomized scheduler.
Adding the random scheduler to TSAN
I’ve applied the above Phab into a local copy of the TSAN runtime, and I have some promising results. The “random scheduler” runtime catches the following bugs that the unmodified TSAN runtime is not able to catch as easily. My machine is a virtual 32core Linux environment.
- stdexec race. I never observed the unmodified TSAN able to catch this bug. With the random scheduler runtime, TSAN reliably reports the bug when running the test in a loop of 100 iterations.
- rare_ref.cpp · GitHub, which is the minimized version of the stdexec bug, only reports the race after running thousands of iterations. The random scheduler often reports the bug in only 10 iterations of the program.
- Two previously unreported bugs in bdlcc_singleconsumerqueueimpl.h, both of which are now tracked in our company’s internal tracking tool. One data race due to a missing release-acquire, and another race caught by a test assertion that was only caught due to the random scheduler able to suspend a producer thread “at just the right moment” while the consumer thread observed a corner case, leading to consuming a value that was never produced.
- I am able to reproduce the similar test results of the “Rare case” under section 6 in https://ceur-ws.org/Vol-2344/paper9.pdf. The random scheduler reliably produces different values from the for loop, whereas the default OS scheduler almost always produced the same value since the first thread executed to completion before the second thread even began.
Proposed changes to the TSAN runtime
I propose that we add changes based on ⚙ D65383 Proposals for implementing a scheduler in TSAN into TSAN. Namely, support for a scheduler with the default scheduler being a “no op” scheduler that acts as TSAN acts today, fully delegating to the OS scheduler. The scheduler would be run on each “sync point” in the program.
The first non-“no op” scheduler added would be the random scheduler. It would
- Track running threads, and only allow one thread at a time to return control back from a “sync point.”
- Intercept pthread mutex and condition variable calls to track which threads are runnable or blocked. The runtime would implement the waiting of blocked threads and wake up other threads when a mutex is unlocked, or a condition variable is notified.
- Implement a minimal “event loop” to properly implement the
sleepinterceptor, and lay the foundations for supporting IO syscalls (so that the runtime could properly track which threads are runnable, and which ones are blocked on IO and should not be selected by the random scheduler). - Run a background watchdog thread to timeout running threads and schedule another thread to ensure progress.
The watchdog would be required for code that relies on conditions to be met that are not observed by the TSAN interceptors, or are not yet supported in the scheduler runtime by the interceptors. For instance, we may not teach the scheduler about epoll on the first implementation, meaning the watchdog might be necessary to wake up other threads to perform an IO operation to satisfy the epoll to return.
In the first implementation, as a matter of limiting scope and complexity, I would propose to not explicitly teach the scheduler about IO syscalls, and rely on the watchdog thread to make progress in programs that rely on IO syscalls. This would mean the random scheduler would be less useful for programs that interact with IO (but still work correctly), but hopefully extremely beneficial for unit testing concurrent data structures and algorithms.