RFC: Add atomic versions of the llvm.memcpy, llvm.memmove and llvm.memset intrinsics

Hi,

LLVM's memory intrinsics are quite useful for performing various optimizations
with frequently used memory operations. Unfortunately this intrinsics are not applicable for
languages with guaranteed atomicity for their memory accesses (like Java for example).

In order to overcome this limitation I'm thinking about adding set of intrinsics
which will execute as a series of unordered atomic memory accesses.
To be more specific here is the definition I'm thinking about:

  declare void @llvm.memcpy_atomic.p0i8.p0i8.i32(i8* <dest>, i8* <src>, i32 <num_elements>,
                                                 i32 <element_size>, i32 <align>, i1 <isvolatile>)

It closelly mimicks original memcpy intrinsic. Only difference is that now we explicitly
specify element_size. Semantically memcpy_atomic is equivalent to the explicit IR loop
in which each load and store is marked as unordered atomic. This definition should give
sufficient freedom to the optimizer while allowing us to transform pre-existing IR loops
into this intrinsics. 'memcpy_atomic' will be lowered into '__memcpy_atomic' library call
(I'm not really certain about choosing function name). 'memset_atomic' and 'memmove_atomic'
both can be defined in a similar way.

It's tempting to model atomic behaviour by adding additional argument to the existing
intrinsics. However by doing so we will need to teach all relevant optimizations and
every backend on how to respect this new argument. This would not only be considerable
amount of work but it will also be quite error prone.

What do folks thik? Does this design makes sense? Would it be usefull for anyone
else developing for languages with similar to Java constraints?

— Igor

At the very least, it is a extremely bad name. The operation is not
atomic at all, it is piecewise at most. I don't see how this is any
better than just using an explicit loop, given that any lowering is
highly unlikely to be able to do anything different.

Joerg

LLVM's memory intrinsics are quite useful for performing various optimizations
with frequently used memory operations. Unfortunately this intrinsics are not applicable for
languages with guaranteed atomicity for their memory accesses (like Java for example).

At the very least, it is a extremely bad name. The operation is not
atomic at all, it is piecewise at most. I don't see how this is any
better than just using an explicit loop, given that any lowering is
highly unlikely to be able to do anything different.

Idea is that we want to leverage LoopIdiomRecognize and MemCpyOptimizer passes. However we can’t do
this because all our memory accesses are atomic and regular memcpy doesn't have any atomicity guarantees.

I agree, name isn’t the best one. How about “memcpy_element_atomic” or something in this manner?