[PATCH] Add atomic_inc and atoimic_add builtins

In the Subject: s/atoimic_add/atomic_add/

From: Tom Stellard <thomas.stellard@amd.com>

---
generic/include/clc/atomic/atomic_add.h | 3 +++
generic/include/clc/atomic/atomic_decl.inc | 10 ++++++++++
generic/include/clc/atomic/atomic_inc.h | 1 +
generic/include/clc/clc.h | 4 ++++
generic/lib/SOURCES | 1 +
generic/lib/atomic/atomic_impl.ll | 11 +++++++++++
r600/lib/SOURCES | 1 +
r600/lib/atomic/atomic.cl | 20 ++++++++++++++++++++
8 files changed, 51 insertions(+)
create mode 100644 generic/include/clc/atomic/atomic_add.h
create mode 100644 generic/include/clc/atomic/atomic_decl.inc
create mode 100644 generic/include/clc/atomic/atomic_inc.h
create mode 100644 generic/lib/atomic/atomic_impl.ll
create mode 100644 r600/lib/atomic/atomic.cl

diff --git a/generic/include/clc/atomic/atomic_add.h b/generic/include/clc/atomic/atomic_add.h
new file mode 100644
index 0000000..66d8978
--- /dev/null
+++ b/generic/include/clc/atomic/atomic_add.h
@@ -0,0 +1,3 @@
+#define __CLC_FUNCTION atomic_add
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
diff --git a/generic/include/clc/atomic/atomic_decl.inc b/generic/include/clc/atomic/atomic_decl.inc
new file mode 100644
index 0000000..03d01aa
--- /dev/null
+++ b/generic/include/clc/atomic/atomic_decl.inc
@@ -0,0 +1,10 @@
+
+#define __CLC_DECLARE_ATOMIC(ADDRSPACE, TYPE) \
+ _CLC_OVERLOAD _CLC_DECL TYPE __CLC_FUNCTION (volatile ADDRSPACE TYPE *, TYPE);
+
+#define __CLC_DECLARE_ATOMIC_ADDRSPACE(TYPE) \
+ __CLC_DECLARE_ATOMIC(global, TYPE); \
+ __CLC_DECLARE_ATOMIC(local, TYPE);
+
+__CLC_DECLARE_ATOMIC_ADDRSPACE(int);
+__CLC_DECLARE_ATOMIC_ADDRSPACE(uint);
diff --git a/generic/include/clc/atomic/atomic_inc.h b/generic/include/clc/atomic/atomic_inc.h
new file mode 100644
index 0000000..2137391
--- /dev/null
+++ b/generic/include/clc/atomic/atomic_inc.h
@@ -0,0 +1 @@
+#define atomic_inc(p) atomic_add(p, 1);
diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index b906245..679d73a 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -96,4 +96,8 @@
#include <clc/synchronization/cl_mem_fence_flags.h>
#include <clc/synchronization/barrier.h>

+/* 6.11.11 Atomic Functins */
+#include <clc/atomic/atomic_add.h>
+#include <clc/atomic/atomic_inc.h>
+
#pragma OPENCL EXTENSION all : disable
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 9ac08bd..2d84e02 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -1,3 +1,4 @@
+atomic/atomic_impl.ll
convert.cl
geometric/cross.cl
geometric/dot.cl
diff --git a/generic/lib/atomic/atomic_impl.ll b/generic/lib/atomic/atomic_impl.ll
new file mode 100644
index 0000000..c11383b
--- /dev/null
+++ b/generic/lib/atomic/atomic_impl.ll
@@ -0,0 +1,11 @@
+define i32 @__clc_atomic_add_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+ %0 = atomicrmw volatile add i32 addrspace(1)* %ptr, i32 %value seq_cst
+ ret i32 %0

I had this nice long response about how the function is supposed to
return the old value, and then I finally looked up the atomicrmw
instruction and realized that the LLVM reference stated that this
instruction does exactly what we need. Would you mind changing the
name of '%0' to '%old' just to prevent future readers from making the
mistake I almost made?

+}
+
+define i32 @__clc_atomic_add_addr3(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+ %0 = atomicrmw volatile add i32 addrspace(3)* %ptr, i32 %value seq_cst
+ ret i32 %0
+}
diff --git a/r600/lib/SOURCES b/r600/lib/SOURCES
index 38919ab..de30f6e 100644
--- a/r600/lib/SOURCES
+++ b/r600/lib/SOURCES
@@ -1,3 +1,4 @@
+atomic/atomic.cl
workitem/get_num_groups.ll
workitem/get_group_id.ll
workitem/get_local_size.ll
diff --git a/r600/lib/atomic/atomic.cl b/r600/lib/atomic/atomic.cl
new file mode 100644
index 0000000..e80180c
--- /dev/null
+++ b/r600/lib/atomic/atomic.cl
@@ -0,0 +1,20 @@
+#include <clc/clc.h>
+
+#define ATOMIC_FUNC_TYPE(SIGN, TYPE, FUNCTION, CL_ADDRSPACE, LLVM_ADDRSPACE) \
+_CLC_OVERLOAD _CLC_DEF SIGN TYPE FUNCTION (volatile CL_ADDRSPACE SIGN TYPE *p, SIGN TYPE val) { \
+ return (SIGN TYPE)__clc_##FUNCTION##_addr##LLVM_ADDRSPACE((volatile CL_ADDRSPACE signed TYPE*)p, (signed TYPE)val); \

The presence of SIGN and signed in these functions makes me nervous...
Is that actually intended? Are we intentionally casting all unsigned
to signed in order to reduce assembly function counts? If it's
intentional, then fine. Just wanted to make sure that you meant to do
this.

I'm assuming that there's also associated LLVM R600 back-end changes
to go with this patch-set, since my CEDAR and PITCAIRN both give me
the following when I actually try to run the new piglit tests:
LLVM ERROR: Cannot select: 0x1b33de0: i32,ch = AtomicLoadAdd
0x1b33ce0, 0x1b32c60, 0x1b33260<Volatile LDST4[%mem]> [ORD=2] [ID=19]

With the subject spelling correction, %0 -> %old change, and
clarification on the SIGN/signed questions:
Reviewed-by: Aaron Watry <awatry@gmail.com>

Followup: With the patches that you sent to mesa-dev yesterday, I have
successfully tested this on CEDAR, and possibly on PITCAIRN. I can't
quite remember if I tested PITCAIRN yesterday, and my ssh port
forwarding seem to be down for the moment so I can't (re)test at this
time.

--Aaron

Followup: With the patches that you sent to mesa-dev yesterday, I have
successfully tested this on CEDAR, and possibly on PITCAIRN. I can't
quite remember if I tested PITCAIRN yesterday, and my ssh port
forwarding seem to be down for the moment so I can't (re)test at this
time.

PITCAIRN probably won't work without this patch:
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043686.html
Did you apply this patch when you tested?

-Tom

In that case, I probably haven't managed to test SI yet... :slight_smile:

I did the testing yesterday, which is before that patch hit the list.
So if SI won't work without it, then I probably only tested evergreen
(it's been a rough few weeks). I won't be in a situation where I can
test my SI card before Tuesday, so from my perspective feel free to
commit this libclc change based on the successful testing on
evergreen. I'm also assuming that you have an SI available to test
this on yourself.

--Aaron

Alright, I finally got a chance to test with that additional LDS size
calculation patch.

With current mesa, current llvm, current clang, and the following
patches applied, the piglit tests for these builtins now work
correctly:

Mesa Patch:
LDS Size calculation patch

LLVM:
R600/SI: Don't emit S_WQM_B64 instruction for compute shaders
R600: Fix incorrect LDS size calculation
R600: Expand SELECT nodes rather than custom lowering them
R600: Add support for local memory atomic add

So, I've now managed to test on both Cedar and Pitcairn without any issues.

Reviewed-By: Aaron Watry <awatry@gmail.com>

--Aaron