RFC: atomic operations on SI+

Hi Tom, Matt,

I'm working on a project that needs few coherent atomic operations (HSA
mode: load, store, compare-and-swap) for std::atomic_uint in HCC.

the attached patch implements atomic compare and swap for SI+
(untested). I tried to stay within what was available, but there are
few issues that I was unsure how to address:

1.) it currently uses v2i32 for both input and output. This needlessly
clobbers a register, but I haven't found a way to say 'output = the
first subreg for input or output = input' in pattern constraints.

2.) I think these will need SLC bit set to work correctly on HSA
targets. I'm not sure what the best way is to do this. I considered:
* adding it as an operand to the newly introduced node
(AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can this
be done without changing order of the inputs? slc is always the last
but the count is variable
* introducing HSA variants of the instructions with SLC bit set
* setting the bit in DAG combine (is there a way to know if I'm
combining atomic op, or do I need to split the SelectMUBUF functions?)

3.) depending on 2 atomic load/store can be either (preferably) MUBUF
load/store with glc and slc set, or hacked to be either swp_noret
(store) or cmp_swap 0,0 (load).

thanks,

0001-AMDGPU-SI-Implement-atomic-compare-and-swap.patch (12 KB)

Hi Tom, Matt,

I'm working on a project that needs few coherent atomic operations (HSA
mode: load, store, compare-and-swap) for std::atomic_uint in HCC.

the attached patch implements atomic compare and swap for SI+
(untested). I tried to stay within what was available, but there are
few issues that I was unsure how to address:

1.) it currently uses v2i32 for both input and output. This needlessly
clobbers a register, but I haven't found a way to say 'output = the
first subreg for input or output = input' in pattern constraints.

2.) I think these will need SLC bit set to work correctly on HSA
targets. I'm not sure what the best way is to do this. I considered:
* adding it as an operand to the newly introduced node
(AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can this
be done without changing order of the inputs? slc is always the last
but the count is variable
* introducing HSA variants of the instructions with SLC bit set
* setting the bit in DAG combine (is there a way to know if I'm
combining atomic op, or do I need to split the SelectMUBUF functions?)

3.) depending on 2 atomic load/store can be either (preferably) MUBUF
load/store with glc and slc set, or hacked to be either swp_noret
(store) or cmp_swap 0,0 (load).

Have you seen this patch: http://reviews.llvm.org/D17280

-Tom

>
> Hi Tom, Matt,
>
> I'm working on a project that needs few coherent atomic operations
> (HSA
> mode: load, store, compare-and-swap) for std::atomic_uint in HCC.
>
> the attached patch implements atomic compare and swap for SI+
> (untested). I tried to stay within what was available, but there
> are
> few issues that I was unsure how to address:
>
> 1.) it currently uses v2i32 for both input and output. This
> needlessly
> clobbers a register, but I haven't found a way to say 'output = the
> first subreg for input or output = input' in pattern constraints.
>
> 2.) I think these will need SLC bit set to work correctly on HSA
> targets. I'm not sure what the best way is to do this. I
> considered:
> * adding it as an operand to the newly introduced node
> (AMDGPUISD::CMP_SWAP), and setting it to 0/1 in Lowering pass. Can
> this
> be done without changing order of the inputs? slc is always the
> last
> but the count is variable
> * introducing HSA variants of the instructions with SLC bit set
> * setting the bit in DAG combine (is there a way to know if I'm
> combining atomic op, or do I need to split the SelectMUBUF
> functions?)
>
> 3.) depending on 2 atomic load/store can be either (preferably)
> MUBUF
> load/store with glc and slc set, or hacked to be either swp_noret
> (store) or cmp_swap 0,0 (load).
>
Have you seen this patch: http://reviews.llvm.org/D17280

Thanks, I subscribed to that one. It answers the first question (can't
be done), but it does not address the SLC problem. anyway I moved my
questions there.

thanks,
Jan