The code compiles, but I am encountering a link-time error:
error LNK2019: unresolved external symbol __atomic_load referenced in function _QFzmumps_arrow_try_treat_recv_bufPzmumps_arrow_treat_recv_buf
error LNK2019: unresolved external symbol __atomic_compare_exchange referenced in function _QFzmumps_arrow_try_treat_recv_bufPzmumps_arrow_treat_recv_buf
Which runtime library provides __atomic_load and __atomic_compare_exchange on Windows? Isn’t this supposed to be handled automatically by the compiler, as it is with other compilers (like ifort)?
In theory, on Windows LLVM should only use libintrinsic calls from msvcrt/ucrt, but if there is no equivalent still emits symbols using gcc’s ABI (implemented in libgcc/libatomic or clang_rt.builtins).
This should be addressed with https://github.com/llvm/llvm-project/pull/110217 which adds an implicit dependency to clang_rt.builtins. The dependency should have already been added implicitly into flang-generated .obj files.
I checked dumbin myself and __atomic_load wasn’t part of flang_rt.builtins by default (only __atomic_load_n).
It is either added with COMPILER_RT_EXCLUDE_ATOMIC_BUILTIN=OFF (as @kiranchandramohan suggested), or provided by clang_rt.atomic (enabled with COMPILER_RT_BUILD_STANDALONE_LIBATOMIC=ON and needs to be added manually to the flang command line).
I think that there might be something more going on here. CAS is definitely something that MSVC can emit. We should see what MSVC is generating under similar conditions in C. I don’t expect to see a call to __atomic_compare_exchange as there is at the very least _InterlockedCompareAndExchange.
__atomic_load I can potentially see being formed, though, Microsoft has added support for C11 atomics and so ideally we should be treating those as _Atomic instead.
That is, this should always use the LLVM instruction instead the libatomic (which mat not exist on Windows) for standard sizes 1, 2, 4, 8 bytes, including Windows. For 16 bytes (e.g. complex double), I think it depends on hardware support of atomics. For all other sizes, Windows, we might need to inline-emit code that uses a lock, like @efriedma-quic mentioned happened in Microsofts implementations of std::atomic. The linked blog entry also mentions “support rountines”, maybe we could call those. With some experiments on godbolt one can see that msvc generates calls to functions such as _Atomic_lock_and_store declared here: windows-msvc-sysroot/include/vcruntime_c11_atomic_support.h at main · trcrsired/windows-msvc-sysroot · GitHub (i.e. Microsoft’s libatomic)