[RFC] Robustess Sanitizer

rymrg · June 12, 2025, 9:11am

Hi,

I would like to propose the addition of RSan (Robustness Sanitizer) to the sanitizers in the compiler. Currently TSan can miss weak behaviours allowed by the C memory model due to synchronization introduced by the sanitizer and as a result miss races. RSan complements TSan by verifying the program does not diverge from sequential consistency even when using weak atomic accesses. Running the two in conjunction will increase the assurance of the program correctness.

Before I rebase the changes, I would like to have the following input:

Should I rebase my changes on main or another version?
Should the sanitizer be part of TSan with an optional flag? I believe this should be the case since they both augment atomic accesses and we’d like to keep them in sync. In addition there’s little reason not to run TSan when running RSan.
What should be the size of the timestamps? Currently in the patch it is 64 bits for maximum correctness. Since TSan itself uses 16 bites for timestamps it seems like the value is too high. Why did TSan opt for 16bit? What would be correct here? 16bits? 32bits?

Thank you,

Paper: https://doi.org/10.1145/3729277
Preprint version: https://arxiv.org/pdf/2504.15036
Diff and build: GitHub - rymrg/RSan: Robustness Sanitizer
Docker image: Robustness Sanitizer

@tsan

melver · June 12, 2025, 10:24am

I had a brief look at the paper and patch. The goal is appreciated, but I don’t see how this can be upstreamed without serious rework.

Unconditional Overhead

The patch adds permanent performance overhead. The enable_robustness_reporting flag only gates reporting call, not the expensive instrumentation work. All the locking and vector clock updates are unconditional. This is a non-starter; any new feature must have zero cost when disabled. Why was this not placed behind a proper feature flag?
Fundamentally Unscalable

The locking design itself is unscalable. The use of a single global locksLock to protect a map of per-address mutexes serializes the instrumentation of all atomic operations, even on unrelated addresses. This design directly explains the 222x worst-case performance regression mentioned in the paper and is not viable for real-world use.
Usability

The tool is overly conservative, flagging benign behaviors (e.g., busy-wait loops) as violations. The proposed solution of requiring manual user annotation shifts a heavy burden onto the developer and requires them to be memory model experts. This seems to undermine the goal of an automated tool. What’s the expected workflow for a developer on a large multi-million line project to triage these violations?

cc @dvyukov @vitalybuka

rymrg · June 12, 2025, 10:54am

Thank you for the quick reply!
I know the patch isn’t ready to be merged as-is. This is why I wanted to have some discussion before I convert it to a more fitting form.

Unconditional Overhead
I fully agree. My intention is to gate the entire instrumentation behind enable_robustness flag. The flag to gate reporting was used to examine the overhead without the overhead of the prints. The reason it was not gated in the paper artefact is I wanted to compare a clean TSan to TSan+RSan making sure I didn’t touch any code paths in the comparison code.
Additionally, at least at first, I do not want to add support for the value tracking extension as it requires annotation from the user. If we could statically identify these patterns, automated support can be added in the future.
Fundamentally Unscalable
From my understanding and profiling, the worst-case performance overhead is by not optimizing enough the path for SC fences. I have some ideas on how they can be compiled in different manner that would require many more location clocks but won’t require a global lock as they do now. Still, reducing the size of timestamps from 64 bits should help significantly (512bit vectorization removed the overhead).
In either case, the locksLock is held for a very short time and I didn’t perceive it as source of slowdown in my profiling.
Usability
I think most of the cases of busy loops could be identifiable statically, thus we can automate value tracking. Without the value tracking support, adding a fence before the loop can mitigate the problem as for any value read we won’t have a violation (assuming proper sync path).

rymrg · June 20, 2025, 9:20am

Regarding locksLock. It is not required for the correctness of the algorithm. We can use some lockless data structure or caching for recently accessed atomics. What is the preferred option for compiler-rt?

tsan · June 24, 2025, 2:03pm

Hi!

This would be useful to have for ticker synchronization algorithms/containers.

Yes, the code must be rebased on top of the main branch.

Yes, I think this should be a TSan mode enabled with an additional flag (but can say more reliably when look at a patch, the diff is hard to review/understand). If you just upload it a PR for an llvm fork (on the older revision, so that you don’t need to rebase), that would be already helpful for initial review.

Timestamps should be consistent across TSan code, if the rest uses 16 bits, this part should use 16 bits as well.

If I understand the paper right (memorize previous values for atomic variables, and return non-default values from loads when MM permits), this is part of the Relacy Race Detector back in 2008:

github.com/dvyukov/relacy

relacy/thread.hpp

f1cee3b5d


      
          template<memory_order mo, bool rmw>

          unsigned get_load_index(atomic_data_impl<thread_count>& var)

          {

              typedef typename atomic_data_impl<thread_count>::history_record history_t;

          

              unsigned index = var.current_index_;

              context& c = ctx();

          

              if (false == val(rmw))

              {

                  size_t const limit = c.is_random_sched() ? atomic_history_size  - 1: 1;

                  for (size_t i = 0; i != limit; ++i, --index)

                  {

                      history_t const& rec = var.history_[index % atomic_history_size];

                      if (false == rec.busy_)

                          return (unsigned)-1; // access to unitialized var

          

                      history_t const& prev = var.history_[(index - 1) % atomic_history_size];

                      if (prev.busy_ && prev.first_seen_order_[index_] <= last_yield_)

                          break;

This file has been truncated. show original

Later I ported it to TSan v1:

github.com/dvyukov/data-race-test-1

tsan/thread_sanitizer.cc

master


      
          // Represents one value in modification history
          // of an atomic variable.
          struct AtomicHistoryEntry {
            // Actual value.
            // (atomics of size more than uint64_t are not supported as of now)
            uint64_t val;
            // ID of a thread that did the modification.
            TID tid;
            // The thread's clock during the modification.
            int32_t clk;
            // Vector clock that is acquired by a thread
            // that loads the value.
            // Similar to Signaller::vts.
            VTS* vts;
          };

But unfortunately it was never ported to v2/v3 runtime.

rymrg · June 24, 2025, 5:07pm

Hi,

Thank you for your reply! I have started to rebase my changes on LLVM 20. You can see a PR here: Rsan by rymrg · Pull Request #145540 · llvm/llvm-project · GitHub

I have made the following changes from the paper in this version:

Feature flag for robustness verification being disabled by default.
Remove value tracking. I might have missed some references.
Timestamps use 16bits like TSan.

There is a slight misunderstanding regarding the value tracking. RSan does not simulate the WMM. Value tracking is used for more fine granular understanding if divergence can happen between C++ and SC. It’s a feature to enable strong CAS along with possible Wait and BCAS primitives.

Wait and BCAS: The program will infinitely loop until the expected value is found. Thus we can ignore robustness violations on any value but the expected one.
Strong CAS: The programm cannot read (as CAS failure) the expected value. Thus we can ignore writes of this value have HBSC but not HB to the thread.

tsan · June 25, 2025, 8:35am

Okay, then I misunderstood what this is doing, and missing rationale behind the approach.

Do I understand correctly that what the tool finds is not bugs?
There are useful algorithms that are not sequentially consistent and/or not non linearizable.

Instead of proving that an algorithms is SC, and then using tools that can verify only SC algorithms, why not make tools verify non-SC algorithms?

rymrg · June 25, 2025, 7:11pm

RSan identifies robustness violations — executions where the WMM diverges from SC. These violations might not be bugs, but many times they can result in bugs. Even experts can get this wrong and bugs will surface when the program is ported to ARM from x86.

While not all useful algorithms are sequentially consistent and/or not non linearizable, many still are. By making sure a program is robust we have SC guarantees while using less synchronization primitives allowing for better performance and changes to optimize the code without introducing weak behaviour.

The reason to try verify using SC instead of verifying non-SC is that verifying non-SC is hard. If we verify the current execution we are susceptible to observer effect introducing synchronization that does not exist in the program, thus missing execution paths that might contain races [example 1.1]. Even if we manage to avoid observer effect, testing on stronger hardware (x86-TSO) will miss executions that happen on weaker hardware (ARM, Power) [1].

Current approaches try to address this by simulating the memory model to access different possible code paths. This is problematic since we either have unbounded memory usage due to storing the entire execution graph [example 7.2] or sacrifice completeness and miss possible executions [example 7.1].

[1] Example of race condition vulnerability that only exists on ARM: GitHub - stong/how-to-exploit-a-double-free: How to exploit a double free vulnerability in 2021. Use After Free for Dummies

Topic		Replies	Views
[ubsan] Add -fsanitize-warn-once, only emit runtime error once per check Clang Frontend	10	150	December 30, 2012
LLVM-based address sanity checker LLVM Dev List Archives	25	325	August 1, 2011
[PATCH] Integer Sanitizer Initial Patches Clang Frontend	12	228	November 27, 2012
x86 cogen quality LLVM Dev List Archives	26	222	May 7, 2004
TSAN hack on AArch64 for Android LLVM Dev List Archives	23	154	September 1, 2015

[RFC] Robustess Sanitizer

Related topics