[PATCH 1/2] atomic: Add generic bitcode implementation of atomic_max

Not used yet...

Signed-off-by: Aaron Watry <awatry@gmail.com>

Passes the piglit tests on radeonsi (pitcairn) for local address space, but
fails for evergreen due to instruction selection errors in LLVM.

Signed-off-by: Aaron Watry <awatry@gmail.com>

Not used yet...

Signed-off-by: Aaron Watry <awatry@gmail.com>
---
generic/include/clc/atomic/atomic_max.h | 3 +++
generic/include/clc/clc.h | 1 +
generic/lib/atomic/atomic_impl.ll | 12 ++++++++++++
3 files changed, 16 insertions(+)
create mode 100644 generic/include/clc/atomic/atomic_max.h

diff --git a/generic/include/clc/atomic/atomic_max.h b/generic/include/clc/atomic/atomic_max.h
new file mode 100644
index 0000000..30dc180
--- /dev/null
+++ b/generic/include/clc/atomic/atomic_max.h
@@ -0,0 +1,3 @@
+#define __CLC_FUNCTION atomic_max
+#include <clc/atomic/atomic_decl.inc>
+#undef __CLC_FUNCTION
diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index b8c1cb9..b492c54 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -143,6 +143,7 @@
#include <clc/atomic/atomic_add.h>
#include <clc/atomic/atomic_dec.h>
#include <clc/atomic/atomic_inc.h>
+#include <clc/atomic/atomic_max.h>
#include <clc/atomic/atomic_sub.h>

/* cl_khr_global_int32_base_atomics Extension Functions */
diff --git a/generic/lib/atomic/atomic_impl.ll b/generic/lib/atomic/atomic_impl.ll
index 9df5b9f..4e228e8 100644
--- a/generic/lib/atomic/atomic_impl.ll
+++ b/generic/lib/atomic/atomic_impl.ll
@@ -10,6 +10,18 @@ entry:
  ret i32 %0
}

+define i32 @__clc_atomic_max_addr1(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline {
+entry:
+ %0 = atomicrmw volatile max i32 addrspace(1)* %ptr, i32 %value seq_cst
+ ret i32 %0
+}

Why is it necessary to mark these with volatile (though I never remember whether it’s the atom_* or atomic_* functions which is supposed to be volatile)

From CL v1.2, Section 6.12.11:

int atomic_max (volatile __global int *p, int val)
unsigned int atomic_max (volatile __global unsigned int *p, unsigned int val)
int atomic_max (volatile __local int *p, int val)
unsigned int atomic_max (volatile __local unsigned int *p, unsigned int val)

I'm assuming that the pointer declaration as volatile should translate
directly to the statement in the bitcode. Let me know if you think I
mis-understood that. If that's the case, then we also need to fix the
atomic_add and atomic_sub functions.

And yes, there's a note that the atom_* functions from CL 1.0 are
still supported. Those functions do NOT define the pointers as
volatile.

--Aaron

--Aaron

From CL v1.2, Section 6.12.11:
int atomic_max (volatile __global int *p, int val)
unsigned int atomic_max (volatile __global unsigned int *p, unsigned int val)
int atomic_max (volatile __local int *p, int val)
unsigned int atomic_max (volatile __local unsigned int *p, unsigned int val)

I'm assuming that the pointer declaration as volatile should translate
directly to the statement in the bitcode. Let me know if you think I
mis-understood that. If that's the case, then we also need to fix the
atomic_add and atomic_sub functions.

Yes, that’s how I would interpret it.

And yes, there's a note that the atom_* functions from CL 1.0 are
still supported. Those functions do NOT define the pointers as
volatile.

Yes, the sets of atomic functions are separate. It looks like currently atom_add is defined to be the same as atomic_add

From CL v1.2, Section 6.12.11:
int atomic_max (volatile __global int *p, int val)
unsigned int atomic_max (volatile __global unsigned int *p, unsigned int val)
int atomic_max (volatile __local int *p, int val)
unsigned int atomic_max (volatile __local unsigned int *p, unsigned int val)

I'm assuming that the pointer declaration as volatile should translate
directly to the statement in the bitcode. Let me know if you think I
mis-understood that. If that's the case, then we also need to fix the
atomic_add and atomic_sub functions.

Yes, that’s how I would interpret it.

And yes, there's a note that the atom_* functions from CL 1.0 are
still supported. Those functions do NOT define the pointers as
volatile.

Yes, the sets of atomic functions are separate. It looks like currently atom_add is defined to be the same as atomic_add

I'd tend to agree with that statement.

Do you want any changes to this patch, or are we good?

I'm currently trying to get atomic_max done because cppamp-driver-ng
requires it, but eventually I'll probably get around to adding the
other missing atomic_* functions and then the wrappers from CL1.0
atom_* functions.

--Aaron

From CL v1.2, Section 6.12.11:
int atomic_max (volatile __global int *p, int val)
unsigned int atomic_max (volatile __global unsigned int *p, unsigned int val)
int atomic_max (volatile __local int *p, int val)
unsigned int atomic_max (volatile __local unsigned int *p, unsigned int val)

I’m assuming that the pointer declaration as volatile should translate
directly to the statement in the bitcode. Let me know if you think I
mis-understood that. If that’s the case, then we also need to fix the
atomic_add and atomic_sub functions.

Yes, that’s how I would interpret it.

And yes, there’s a note that the atom_* functions from CL 1.0 are
still supported. Those functions do NOT define the pointers as
volatile.

Yes, the sets of atomic functions are separate. It looks like currently atom_add is defined to be the same as atomic_add

I’d tend to agree with that statement.

Do you want any changes to this patch, or are we good?

It’s fine, although it would make sense to add both the atomic_ and atom_ versions at the same time

From CL v1.2, Section 6.12.11:
int atomic_max (volatile __global int *p, int val)
unsigned int atomic_max (volatile __global unsigned int *p, unsigned int
val)
int atomic_max (volatile __local int *p, int val)
unsigned int atomic_max (volatile __local unsigned int *p, unsigned int val)

I'm assuming that the pointer declaration as volatile should translate
directly to the statement in the bitcode. Let me know if you think I
mis-understood that. If that's the case, then we also need to fix the
atomic_add and atomic_sub functions.

Yes, that’s how I would interpret it.

And yes, there's a note that the atom_* functions from CL 1.0 are
still supported. Those functions do NOT define the pointers as
volatile.

Yes, the sets of atomic functions are separate. It looks like currently
atom_add is defined to be the same as atomic_add

I'd tend to agree with that statement.

Do you want any changes to this patch, or are we good?

It’s fine, although it would make sense to add both the atomic_ and atom_
versions at the same time

I'll do that... Also, I'll be sending a v2 because while I was
attempting to get this working on evergreen, I noticed that this
treated all atomic_max as signed comparisons... And the existing
piglit tests don't test that correctly... *facepalm*

I'll be updating the piglit tests, sending out a v2 of this, and I
just sent a patch to the llvm-commits list to add the
LDS_MAX_[U]INT[_RET] instructions for evergreen so that this will work
on more than just SI. I've made sure to CC you and Tom on that one.

If someone has the ability to test/fix atomic_max for cayman (if
fixing is needed), that would probably also be a good thing.

--Aaron