A proposed approach to std::atomic_wait for libcxx

Hi everyone,

To start the discussion of how libcxx should go about implementing this feature, I’ve prepared a cross-platform implementation of atomic_wait/atomic_notify_* for Mac/Linux/Windows/CUDA/unidentified-platform. I don’t claim that it’s fully tuned for your platform, nor do I claim that it’s perfect for every possible use, but it should not be terribly bad for any use either. It has various knobs to turn paths On/Off, so you can choose a different path on each platform, so long as it’s supported at all on that platform.

You can find the implementation here: https://github.com/ogiroux/atomic_wait/.

It has these strategies implemented:

  • Contention table. Used to optimize futex notify, or to hold CVs. Disable with __NO_TABLE.

  • Futex. Supported on Linux and Windows. For performance requires a table on Linux. Disable with __NO_FUTEX.

  • Condition variables. Supported on Linux and Mac. Requires table to function. Disable with __NO_CONDVAR.

  • Timed back-off. Supported on everything. Disable with __NO_SLEEP.

  • Spinlock. Supported on everything. Force with __NO_IDENT. Note: performance is too terrible to use.

The strategy is chosen this way, by platform:

  • Linux: default to futex (with table), fallback to futex (no table) → CVs → timed backoff → spin.

  • Mac: default to CVs (table), fallback to timed backoff → spin.

  • Windows: default to futex (no table), fallback to timed backoff → spin.

  • CUDA: default to timed backoff, fallback to spin. (This is not all checked in in this tree.)

  • Unidentified platform: default to spin.

The unidentified platform support could be better. For instance, we should probably assume that is implemented and use the sleeping/yielding facilities there. It should not fall all the way back to __NO_IDENT, it should instead fall back to about where CUDA is expected to be.

One of the main discussion points I’d like to drive with this, is the design of the contention management table, to go along the sharded lock table that backs _atomic* and _c11_atomic* built-ins. Ideally this would be handled the same way, meaning that it should live in libatomic.a or your substitute, and be shared with other C++ standard libraries on your platform.

Please discuss!

Sincerely,

Olivier

Seems fine to me.

Dumb question: What (if anything) should be done to support platforms with a single threaded system. Assert if not immediately available? Optimistically spinlock in case they are using this on “elsewhere” memory of some sort?

Pedantic: I probably wouldn’t call the Windows implementation futex based ( https://devblogs.microsoft.com/oldnewthing/20170601-00/?p=96265 ). I wouldn’t go and call the Linux implementation WaitOnAddress based either.

Seems fine to me.

Cool. More feedback welcome.

Dumb question: What (if anything) should be done to support platforms with a single threaded system. Assert if not immediately available? Optimistically spinlock in case they are using this on “elsewhere” memory of some sort?

A single-threaded platform should just spin. Either the program has a structural error or the memory has to be set from “elsewhere” as you say, and only spinning will do.

Pedantic: I probably wouldn’t call the Windows implementation futex based ( https://devblogs.microsoft.com/oldnewthing/20170601-00/?p=96265 ). I wouldn’t go and call the Linux implementation WaitOnAddress based either.

I don’t entirely agree with this article, but can agree it’s a pedantic debate. :blush:

Just for the record, I gave some feedback to Olivier offline. The main thing we agreed on was that we can implement the contention table inside compiler-rt and provide function declarations in the headers that allow libc++ to get access to the associated functions (e.g. __cxx_atomic_notify_all). We wouldn't add builtins to Clang yet until we actually have a need for them.

Louis