Given the following examples, I would expect LLVM to be able to remove the redundant atomics since one possible valid code ordering` is that they run with no one else observing the value change
LLVM is really conservative on eliminating atomic loads/stores. You can think of removing any of them allows code to moves across the program point where an atomic used to be, which really shouldn’t happen.
The whole reason this optimization should be valid is that it is valid to move non-atomic code across an atomic so long as that code doesn’t depend on an atomic.
Yes, removing the redundant load and store are valid transforms per the C++ memory model, but LLVM is very sensitive here. We don’t remove them with any memory ordering.
There are cases where you cannot remove a seq_cst operation entirely, but it’s always fine to remove redundant ones if you would otherwise be able to move them next to each other.
To give a little context: This question was spawned by a question on julia discourse, and was originally about store-load forwarding.
The pattern was: We want an init_once that can LICM, I.e. for
tmp = get_lazy(lazy_var);
//something that llvm understands
tmp = get_lazy(lazy_var);
we want the second get_lazy removed.
There must be an initial load_acquire in get_lazy to check whether the thing has been initialized; and if it hasn’t, there must be a store_release to mark that it has been initialized.
We can make it so that the load_acquire and the conditional store_release are inlined (the slow-path initialization code doesn’t need to be inlined / visible to llvm).
The second get_lazy load-acquire and conditional check should be completely optimized away: It knows that it has been initialized, either because the first check already confirmed initialization and there was no intervening store, or because the first call did the initialization and the value can be forwarded from the store-release.
Mechanically, the second load_acquire should be moved upwards, duplicated at the branch, and then be removed at both points (once as a redundant load, and once due to store-load forwarding).
I think this optimization would be invalid if the load was seq_cst – we would not be able to move it across the intervening code. On the other hand, monotonic / relaxed is not strong enough for this construction to be safe.
So the original question was pretty exactly about duplicate load_acquire removal and about store → load_acquire forwarding.