Atomic floating point operations and libstdc++

Hello,

I’m working on implementing builtins in GCC to directly specify the
behaviour of the C++20 std::atomic<float>::fetch_add (and similar).
(Point being floating point types).

Summary of what I’m looking for:

  1. I would like to find agreement on what resolved __atomic_fetch_add
    function names to use when implementing the operations with libcalls.
  2. I’m hoping to find agreement on how libstdc++ should check these new
    resolved versions of __atomic_fetch_add (and similar) are available.

The goal here is that some GPU’s can perform these operations directly,
and the current set of GNU atomic builtins can not specify the operation
directly. This means that compilers using the current libstdc++ headers
while targeting these GPU’s (e.g. with something like the -stdpar
option) can not produce optimal code.
With these builtins defined, libstdc++ can implement the
std::atomic<float>::fetch_add (and similar) functions using
__atomic_fetch_add in the same way as is done for other types, and hence
compilers can easily lower this to optimal code.

As part of that work we need to define an ABI to call libatomic or any
other library implementation of the operation (following the example of
all other GNU atomic builtins), and we also need to decide how libstdc++
can check for the availability of these builtins. Since both of these
choices are visible outside the GNU project I would like to double-check
with the LLVM community if the current proposals work for you too.

I’m proposing names following the existing convention in GCC of defining
resolved versions of the builtins to match a libatomic ABI entry point.
The existing examples have the format: __atomic_add_fetch_<suffix>.

Currently the suffixes used are for different sizes of datatypes
{1,2,4,8,16}. This can not work for floating point types since
different types can have the same size (e.g. bf16 and f16). Hence I’m
proposing the following suffixes:

    Type               GCC internal name    Suffix
    float               N/A                 f
    double              N/A                 <no suffix>
    long double         N/A                 l
    std::bfloat16_t     __bf16              f16b
    std::float16_t      _Float16            f16
    std::float32_t      _Float32            f32
    std::float64_t      _Float64            f64
    std::float128_t     _Float128           f128
    N/A                 _Float32x           f32x
    N/A                 _Float64x           f64x
    N/A                 _Float128x          N/A  <not implementing>

These suffixes follow the existing convention in GCC builtins taking
floating point types like __builtin_acosh. From scanning through the
source code it seems that this convention is matched in LLVM, though
LLVM does not implement all floating point types for such functions
(with double, float, long double, and float128 implemented).

Again from scanning through the source code it seems that LLVM also uses
the existing {1,2,4,8,16} suffixes for library calls to implement atomic
operations (as one would expect since this is a platform ABI).

As far as I can see clang doesn’t seem to accept these as builtins
specified in the source – is that correct?

While libstdc++ will only use the overloaded version of each operation
(i.e. __atomic_fetch_add), it needs some way to determine whether the
current compiler can use that overloaded builtin on the relevant
floating point type.
So far it seems that the cleanest way for libstdc++ to determine this is
to check against the existance of the resolved builtins with the above
suffixes.
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663377.html

That implies that compilers will need to agree on these names for the
resolved versions of the atomic builtins both at the ABI level for
libatomic, and at the API level for libstdc++.

Does anything in the names proposed above sound like it might be
problematic?

Do the LLVM community dislike the proposed exposure of the resolved
floating point atomic builtins to the user (i.e. requiring
__has_builtin(__atomic_fetch_add_fp<suffix>) to work)?

IIUC LLVM does not currently expose any other resolved atomic builtins
to the user.
While it would not be necessary to continue being functional, the
current suggestion would require such exposure by any compiler if it
wants to use the libstdc++ codepath implementing
std::atomic<float>::fetch_add with __atomic_fetch_add.
If some other advertisement mechanism would be preferable to the LLVM
project please do mention it.

Regards,
Matthew

I think someone filed a bug report for these at some point, but these variations aren’t documented anywhere, and nobody is using them as far as I know. If you do want to use them for some reason, please look into the documentation. Or consider just going a different route.

The libatomic names seem fine.

clang has supposed __atomic_fetch_add on FP types for a while now (⚙ D71726 Let clang atomic builtins fetch add/sub support floating point types). If you need to detect support for this, you can use SFINAE.

Wow thanks! I did not know that clang supported SFINAE on builtins!

Unfortunately GCC doesn’t support doing that, so using it for libstdc++ seems a little problematic. (Though I will check to see if making GCC handle SFINAE on this is relatively simple).

I’ll also have a closer look into the state of the suffixed versions in clang – I did try it out on compiler explorer and things didn’t seem to work, but I’ll look for this bug report you mention and see if there’s something I missed.

Do the LLVM community dislike the proposed exposure of the resolved
floating point atomic builtins to the user (i.e. requiring
__has_builtin(__atomic_fetch_add_fp<suffix>) to work)?

I don’t like that. The atomic builtins are documented/implemented as type-generic builtins.

The size-suffixed symbols which the generic builtins might (or might not) emit a reference to are an (ABI) implementation detail, which code shouldn’t be using directly. And currently in Clang, they are indeed not exposed as builtins.