I’ve uploaded a few changes to review for implementing Step “A1” in the atomic cleanup plan:
Add _atomic* lowering to AtomicExpandPass.
Switch over targets to use AtomicExpandPass, and clean up target atomics code.
There’s also this, semi-related cleanup:
Clang: Set MaxAtomicInlineWidth properly for i386, i486, and x86-64 cpus without cmpxchg16b. Also, remove all the manual definition of
There’s some kinda annoying issues in LLVM I’ve run into so far:
- The required duplication of logic for which platforms have what-sized atomics between LLVM and Clang is unfortunate. This falls into the general problem of clang’s lib/Basic/Targets.cpp file being mostly an awful repetition of information available in the LLVM backends.
We really ought to have Clang start depending on the LLVM backends.
- The inability to access C ABI information in LLVM, whilst creating the libcalls is somewhat of a problem.
The atomic libcalls need the following information:
- What “size_t” and “int” mean? (DL.getIntPtrType() seems okayish for the former (malloc already does that), and can use getInt32() for the latter…except that MSP430 seems to use 16-bit “int”)
- What sizes of integers actually exist? You need to know which functions of the form “T __atomic_load_N(T*, int)”, with some integral type T, can actually exist in the atomics library. (e.g. if the platform has no __int128, there won’t be an “__int128 __atomic_load_16(__int128*, int)” you can actually call.
All that information is available in Clang, but not in LLVM. I think such low-level C ABI details would be a good idea to have in LLVM.
I also ran into a couple of other issues whilst on this quest:
- The C11 _Atomic ABI is incompatible between GCC and Clang . As is libc++'s std::atomic. (But not libstdc++: that uses the same ABI in both compilers).
Also, in Clang, _Atomic and libc++'s std::atomic use unfortunate choices of “maximum size to promote” that depend on the platform, and don’t allow for easy future extension of the set of supported atomic sizes.
- LLVM may not have a strong enough memory model to support GCC’s _sync* builtins properly (this may affect only ARMv8/AArch64 at the moment?). GCC has (recently) decided that __sync_fetch_and_add(&val, 4); needs stronger memory barriers than “__atomic_fetch_add(&val, 4, __ATOMIC_SEQ_CST);” . In particular, the former should not allow the processor to reorder a subsequent store before the store-release instruction at the end of the LL/SC sequence generated by atomic_fetch_add.
The C11, and LLVM, definition of a seq_cst memory operation only makes it ordered with other seq_cst operations, NOT with other non-seq_cst atomics. But, apparently GCC feels that the definition of its __sync_fetch_and_add builtin requires that it act like it act like a seq_cst FENCE, preventing non-seq_cst memory operations from moving across it. And thus, it now emits an explicit barrier instruction after the store-release.
Should we do similar to GCC, and add a 6th kind of ordering, “seq_cst_fence” to model this?