[PATCH] atomics: redefine atom_inc/atom_dec using atom_add/atom_sub

This is exactly what the atomic_[inc|dec] functions do, and it fixes
kernel compilation failures using the OpenCL CTS.

Signed-off-by: Aaron Watry <awatry@gmail.com>

This is exactly what the atomic_[inc|dec] functions do, and it fixes
kernel compilation failures using the OpenCL CTS.

what are the compilation failures? I thought piglit had these ops
covered.

Signed-off-by: Aaron Watry <awatry@gmail.com>
---
generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
generic/lib/SOURCES | 4 ----
generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
9 files changed, 4 insertions(+), 48 deletions(-)
delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl

diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
index bbc872c..a520fe4 100644
--- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
+++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
@@ -1,2 +1 @@
-_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
-_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
+#define atom_dec(p) atom_sub(p, 1)

Does this not cause redefinition warning (since local version uses the
same define?

Jan

Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.

The issue I'm running into with the CTS is that atomic_* works fine,
but I'm getting an ambiguous function reference for
atom_inc/atom_dec/atom_or/atom_and.

Example:

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
__kernel void test_atomic_fn(volatile __global uint *destMemory,
__global uint *oldValues)
{
    int tid = get_global_id(0);
    oldValues[tid] = atom_dec( &destMemory[0] );
}

Log:
Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
Build log for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)" is:

> > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > kernel compilation failures using the OpenCL CTS.
>
> what are the compilation failures? I thought piglit had these ops
> covered.

Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.

ah right. guess we should add those as well (another time...).

The issue I'm running into with the CTS is that atomic_* works fine,
but I'm getting an ambiguous function reference for
atom_inc/atom_dec/atom_or/atom_and.

Example:

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
__kernel void test_atomic_fn(volatile __global uint *destMemory,
__global uint *oldValues)
{
    int tid = get_global_id(0);
    oldValues[tid] = atom_dec( &destMemory[0] );
}

Log:
Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
Build log for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
------------
input.cl:6:22: error: call to 'atom_dec' is ambiguous
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
note: candidate function
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
note: candidate function
------------

this looks like clang treats both:
"volatile __global uint* -> __global unsigned int*"
and
"volatile __global uint* -> __globale int *"
as equal cost.

Can you confirm that removing 'volatile' from destMemory hides the
issue?

>
> >
> > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > ---
> > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > generic/lib/SOURCES | 4 ----
> > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > 9 files changed, 4 insertions(+), 48 deletions(-)
> > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> >
> > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > index bbc872c..a520fe4 100644
> > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > @@ -1,2 +1 @@
> > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > +#define atom_dec(p) atom_sub(p, 1)
>
> Does this not cause redefinition warning (since local version uses the
> same define?

Yes, it looks like it does. When building libclc and running the CTS
tests, everything looks ok, but if I compile the atom_dec kernel
manually via clang, I get:
--------------------------------
atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
(aka 'volatile __global unsigned int *') to parameter of type
'__global int *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
    oldValues[tid] = atom_dec( &destMemory[0] );
                               ^~~~~~~~~~~~~~
/usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
note: expanded from macro 'atom_dec'
#define atom_dec(p) atom_sub(p, 1)
                             ^
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
note: passing argument to parameter 'p' here
_CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
                                                 ^
1 warning generated.
--------------------------------

So yeah, I guess this is the wrong approach.

I did just discover while poking around at the kernel that if I remove
the "volatile" keyword from destMemory, then the kernel builds fine.
The difference in the atomic_inc and the CL 1.0 atom_inc function
declarations is basically that volatile keyword. If I re-add the
volatile keyword to the kernel, and then also add it to the function
declaration in atom_dec.h, the kernel also builds. Looking at POCL
and Beignet they both just do a single blanket define in a top-level
header file along the lines of:
#define atom_dec atomic_dec

That implies that both of those runtimes declare the global/local
pointers as volatile even for the CL1.0 variants, which doesn't
necessarily match the spec (although given how the CL 1.1/1.2 spec are
written, they basically just say "the CL 1.0 atom_* functions are
still supported), but it is what the CTS is testing. I don't
currently have access to the definitions that nv/amd use in their
closed-source binaries (and I haven't managed to find what ROCm uses,
but I'm thinking that the implication is that when Khronos
renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
retroactively adding the 'volatile' keyword to the function argument.

I guess we could just remove the existing defines under
cl_khr_[global|local]_int32_base_atomics and put a single set of
defines in either clc.h, or in a new header under something like
generic/include/clc/atomic/atom_functions.h that just defines the
renames of the existing functions. Alternatively, we could just go
back and add the volatile keyword to the pointer arguments in the
atom_* headers/implementations.

Thoughts?

I checked that we also use similar define for atomic_inc/dec.
9040bf38 addressed similar issue when calling atom_add.
I think adding volatile to atom_* functions would just hide the issue
(since test suites use volatile pointers).
see [0]:
"A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
types of conversions. Please note: ``U`` lacking qualifiers that are present
on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."

I think a proper fix needs to be done on clang side.

Jan

[0] ⚙ D24113 Allow implicit conversions between incompatible pointer types in overload resolution in C.

> > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > kernel compilation failures using the OpenCL CTS.
>
> what are the compilation failures? I thought piglit had these ops
> covered.

Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.

ah right. guess we should add those as well (another time...).

Yeah. In theory, we can add those to piglit. Since the CTS was
open-sourced, I'm mostly using its failures as a worklist for now just
to prevent duplication of work.

The issue I'm running into with the CTS is that atomic_* works fine,
but I'm getting an ambiguous function reference for
atom_inc/atom_dec/atom_or/atom_and.

Example:

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
__kernel void test_atomic_fn(volatile __global uint *destMemory,
__global uint *oldValues)
{
    int tid = get_global_id(0);
    oldValues[tid] = atom_dec( &destMemory[0] );
}

Log:
Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
Build log for device "AMD PITCAIRN (DRM 2.49.0 /
4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
------------
input.cl:6:22: error: call to 'atom_dec' is ambiguous
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
note: candidate function
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
note: candidate function
------------

this looks like clang treats both:
"volatile __global uint* -> __global unsigned int*"
and
"volatile __global uint* -> __globale int *"
as equal cost.

Can you confirm that removing 'volatile' from destMemory hides the
issue?

Yes, I can confirm that if I rebuild libclc using the latest upstream
revision (and without any of my changes)
and then remove the volatile keyword from the atom_* test kernels
being compiled that the
tests compile and run to completion successfully.

>
> >
> > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > ---
> > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > generic/lib/SOURCES | 4 ----
> > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > 9 files changed, 4 insertions(+), 48 deletions(-)
> > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> >
> > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > index bbc872c..a520fe4 100644
> > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > @@ -1,2 +1 @@
> > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > +#define atom_dec(p) atom_sub(p, 1)
>
> Does this not cause redefinition warning (since local version uses the
> same define?

Yes, it looks like it does. When building libclc and running the CTS
tests, everything looks ok, but if I compile the atom_dec kernel
manually via clang, I get:
--------------------------------
atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
(aka 'volatile __global unsigned int *') to parameter of type
'__global int *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
    oldValues[tid] = atom_dec( &destMemory[0] );
                               ^~~~~~~~~~~~~~
/usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
note: expanded from macro 'atom_dec'
#define atom_dec(p) atom_sub(p, 1)
                             ^
/usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
note: passing argument to parameter 'p' here
_CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
                                                 ^
1 warning generated.
--------------------------------

So yeah, I guess this is the wrong approach.

I did just discover while poking around at the kernel that if I remove
the "volatile" keyword from destMemory, then the kernel builds fine.
The difference in the atomic_inc and the CL 1.0 atom_inc function
declarations is basically that volatile keyword. If I re-add the
volatile keyword to the kernel, and then also add it to the function
declaration in atom_dec.h, the kernel also builds. Looking at POCL
and Beignet they both just do a single blanket define in a top-level
header file along the lines of:
#define atom_dec atomic_dec

That implies that both of those runtimes declare the global/local
pointers as volatile even for the CL1.0 variants, which doesn't
necessarily match the spec (although given how the CL 1.1/1.2 spec are
written, they basically just say "the CL 1.0 atom_* functions are
still supported), but it is what the CTS is testing. I don't
currently have access to the definitions that nv/amd use in their
closed-source binaries (and I haven't managed to find what ROCm uses,
but I'm thinking that the implication is that when Khronos
renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
retroactively adding the 'volatile' keyword to the function argument.

I guess we could just remove the existing defines under
cl_khr_[global|local]_int32_base_atomics and put a single set of
defines in either clc.h, or in a new header under something like
generic/include/clc/atomic/atom_functions.h that just defines the
renames of the existing functions. Alternatively, we could just go
back and add the volatile keyword to the pointer arguments in the
atom_* headers/implementations.

Thoughts?

I checked that we also use similar define for atomic_inc/dec.
9040bf38 addressed similar issue when calling atom_add.
I think adding volatile to atom_* functions would just hide the issue
(since test suites use volatile pointers).
see [0]:
"A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
  ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
  types of conversions. Please note: ``U`` lacking qualifiers that are present
  on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."

I think a proper fix needs to be done on clang side.

Yes, the commit message for that llvm revision you linked sounds
exactly like what is going on here.

Ugh, looks like I need to read up on llvm development guidelines again
(a codebase that I try to limit my exposure to for my own sanity),
unless you want to look into that part. :wink:

--Aaron

> > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > > > kernel compilation failures using the OpenCL CTS.
> > >
> > > what are the compilation failures? I thought piglit had these ops
> > > covered.
> >
> > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
>
> ah right. guess we should add those as well (another time...).

Yeah. In theory, we can add those to piglit. Since the CTS was
open-sourced, I'm mostly using its failures as a worklist for now just
to prevent duplication of work.

>
> >
> > The issue I'm running into with the CTS is that atomic_* works fine,
> > but I'm getting an ambiguous function reference for
> > atom_inc/atom_dec/atom_or/atom_and.
> >
> > Example:
> >
> > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
> > __kernel void test_atomic_fn(volatile __global uint *destMemory,
> > __global uint *oldValues)
> > {
> > int tid = get_global_id(0);
> > oldValues[tid] = atom_dec( &destMemory[0] );
> > }
> >
> > Log:
> > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
> > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
> > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
> > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
> > ------------
> > input.cl:6:22: error: call to 'atom_dec' is ambiguous
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
> > note: candidate function
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
> > note: candidate function
> > ------------
>
> this looks like clang treats both:
> "volatile __global uint* -> __global unsigned int*"
> and
> "volatile __global uint* -> __globale int *"
> as equal cost.
>
> Can you confirm that removing 'volatile' from destMemory hides the
> issue?

Yes, I can confirm that if I rebuild libclc using the latest upstream
revision (and without any of my changes)
and then remove the volatile keyword from the atom_* test kernels
being compiled that the
tests compile and run to completion successfully.

>
> >
> > >
> > > >
> > > > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > > > ---
> > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > > > generic/lib/SOURCES | 4 ----
> > > > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > 9 files changed, 4 insertions(+), 48 deletions(-)
> > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> > > >
> > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > index bbc872c..a520fe4 100644
> > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > @@ -1,2 +1 @@
> > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > > > +#define atom_dec(p) atom_sub(p, 1)
> > >
> > > Does this not cause redefinition warning (since local version uses the
> > > same define?
> >
> > Yes, it looks like it does. When building libclc and running the CTS
> > tests, everything looks ok, but if I compile the atom_dec kernel
> > manually via clang, I get:
> > --------------------------------
> > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
> > (aka 'volatile __global unsigned int *') to parameter of type
> > '__global int *' discards qualifiers
> > [-Wincompatible-pointer-types-discards-qualifiers]
> > oldValues[tid] = atom_dec( &destMemory[0] );
> > ^~~~~~~~~~~~~~
> > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
> > note: expanded from macro 'atom_dec'
> > #define atom_dec(p) atom_sub(p, 1)
> > ^
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
> > note: passing argument to parameter 'p' here
> > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
> > ^
> > 1 warning generated.
> > --------------------------------
> >
> > So yeah, I guess this is the wrong approach.
> >
> > I did just discover while poking around at the kernel that if I remove
> > the "volatile" keyword from destMemory, then the kernel builds fine.
> > The difference in the atomic_inc and the CL 1.0 atom_inc function
> > declarations is basically that volatile keyword. If I re-add the
> > volatile keyword to the kernel, and then also add it to the function
> > declaration in atom_dec.h, the kernel also builds. Looking at POCL
> > and Beignet they both just do a single blanket define in a top-level
> > header file along the lines of:
> > #define atom_dec atomic_dec
> >
> > That implies that both of those runtimes declare the global/local
> > pointers as volatile even for the CL1.0 variants, which doesn't
> > necessarily match the spec (although given how the CL 1.1/1.2 spec are
> > written, they basically just say "the CL 1.0 atom_* functions are
> > still supported), but it is what the CTS is testing. I don't
> > currently have access to the definitions that nv/amd use in their
> > closed-source binaries (and I haven't managed to find what ROCm uses,
> > but I'm thinking that the implication is that when Khronos
> > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
> > retroactively adding the 'volatile' keyword to the function argument.
> >
> > I guess we could just remove the existing defines under
> > cl_khr_[global|local]_int32_base_atomics and put a single set of
> > defines in either clc.h, or in a new header under something like
> > generic/include/clc/atomic/atom_functions.h that just defines the
> > renames of the existing functions. Alternatively, we could just go
> > back and add the volatile keyword to the pointer arguments in the
> > atom_* headers/implementations.
> >
> > Thoughts?
>
> I checked that we also use similar define for atomic_inc/dec.
> 9040bf38 addressed similar issue when calling atom_add.
> I think adding volatile to atom_* functions would just hide the issue
> (since test suites use volatile pointers).
> see [0]:
> "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
> ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
> types of conversions. Please note: ``U`` lacking qualifiers that are present
> on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
>
> I think a proper fix needs to be done on clang side.

Yes, the commit message for that llvm revision you linked sounds
exactly like what is going on here.

Ugh, looks like I need to read up on llvm development guidelines again
(a codebase that I try to limit my exposure to for my own sanity),
unless you want to look into that part. :wink:

Clang is not really my thing, and this issue is a rather low priority
for me. completing atomics support or adding register spilling for EG
are higher on my list.

You can try contacting Yaxun, he appeared to run into the same issue
and contributes to clang regularly.

regards,
Jan

> > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > > > kernel compilation failures using the OpenCL CTS.
> > >
> > > what are the compilation failures? I thought piglit had these ops
> > > covered.
> >
> > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
>
> ah right. guess we should add those as well (another time...).

Yeah. In theory, we can add those to piglit. Since the CTS was
open-sourced, I'm mostly using its failures as a worklist for now just
to prevent duplication of work.

>
> >
> > The issue I'm running into with the CTS is that atomic_* works fine,
> > but I'm getting an ambiguous function reference for
> > atom_inc/atom_dec/atom_or/atom_and.
> >
> > Example:
> >
> > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
> > __kernel void test_atomic_fn(volatile __global uint *destMemory,
> > __global uint *oldValues)
> > {
> > int tid = get_global_id(0);
> > oldValues[tid] = atom_dec( &destMemory[0] );
> > }
> >
> > Log:
> > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
> > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
> > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
> > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
> > ------------
> > input.cl:6:22: error: call to 'atom_dec' is ambiguous
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
> > note: candidate function
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
> > note: candidate function
> > ------------
>
> this looks like clang treats both:
> "volatile __global uint* -> __global unsigned int*"
> and
> "volatile __global uint* -> __globale int *"
> as equal cost.
>
> Can you confirm that removing 'volatile' from destMemory hides the
> issue?

Yes, I can confirm that if I rebuild libclc using the latest upstream
revision (and without any of my changes)
and then remove the volatile keyword from the atom_* test kernels
being compiled that the
tests compile and run to completion successfully.

>
> >
> > >
> > > >
> > > > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > > > ---
> > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > > > generic/lib/SOURCES | 4 ----
> > > > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > 9 files changed, 4 insertions(+), 48 deletions(-)
> > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> > > >
> > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > index bbc872c..a520fe4 100644
> > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > @@ -1,2 +1 @@
> > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > > > +#define atom_dec(p) atom_sub(p, 1)
> > >
> > > Does this not cause redefinition warning (since local version uses the
> > > same define?
> >
> > Yes, it looks like it does. When building libclc and running the CTS
> > tests, everything looks ok, but if I compile the atom_dec kernel
> > manually via clang, I get:
> > --------------------------------
> > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
> > (aka 'volatile __global unsigned int *') to parameter of type
> > '__global int *' discards qualifiers
> > [-Wincompatible-pointer-types-discards-qualifiers]
> > oldValues[tid] = atom_dec( &destMemory[0] );
> > ^~~~~~~~~~~~~~
> > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
> > note: expanded from macro 'atom_dec'
> > #define atom_dec(p) atom_sub(p, 1)
> > ^
> > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
> > note: passing argument to parameter 'p' here
> > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
> > ^
> > 1 warning generated.
> > --------------------------------
> >
> > So yeah, I guess this is the wrong approach.
> >
> > I did just discover while poking around at the kernel that if I remove
> > the "volatile" keyword from destMemory, then the kernel builds fine.
> > The difference in the atomic_inc and the CL 1.0 atom_inc function
> > declarations is basically that volatile keyword. If I re-add the
> > volatile keyword to the kernel, and then also add it to the function
> > declaration in atom_dec.h, the kernel also builds. Looking at POCL
> > and Beignet they both just do a single blanket define in a top-level
> > header file along the lines of:
> > #define atom_dec atomic_dec
> >
> > That implies that both of those runtimes declare the global/local
> > pointers as volatile even for the CL1.0 variants, which doesn't
> > necessarily match the spec (although given how the CL 1.1/1.2 spec are
> > written, they basically just say "the CL 1.0 atom_* functions are
> > still supported), but it is what the CTS is testing. I don't
> > currently have access to the definitions that nv/amd use in their
> > closed-source binaries (and I haven't managed to find what ROCm uses,
> > but I'm thinking that the implication is that when Khronos
> > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
> > retroactively adding the 'volatile' keyword to the function argument.
> >
> > I guess we could just remove the existing defines under
> > cl_khr_[global|local]_int32_base_atomics and put a single set of
> > defines in either clc.h, or in a new header under something like
> > generic/include/clc/atomic/atom_functions.h that just defines the
> > renames of the existing functions. Alternatively, we could just go
> > back and add the volatile keyword to the pointer arguments in the
> > atom_* headers/implementations.
> >
> > Thoughts?
>
> I checked that we also use similar define for atomic_inc/dec.
> 9040bf38 addressed similar issue when calling atom_add.
> I think adding volatile to atom_* functions would just hide the issue
> (since test suites use volatile pointers).
> see [0]:
> "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
> ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
> types of conversions. Please note: ``U`` lacking qualifiers that are present
> on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
>
> I think a proper fix needs to be done on clang side.

Yes, the commit message for that llvm revision you linked sounds
exactly like what is going on here.

Ugh, looks like I need to read up on llvm development guidelines again
(a codebase that I try to limit my exposure to for my own sanity),
unless you want to look into that part. :wink:

Clang is not really my thing, and this issue is a rather low priority
for me. completing atomics support or adding register spilling for EG
are higher on my list.

You can try contacting Yaxun, he appeared to run into the same issue
and contributes to clang regularly.

I'll take a look, and might reach out for help, or I'll try to dig
into it and see if I can come up with a solution myself.

For now, like you, this is just one of the minor issues that I've been
running into.

A bigger issue is the regression caused by clang r303370, which I
believe is what causes ALL of the CTS local atomic tests to fail,
along with a few other test suites that expect to be able to
set a local buffer as a kernel argument.

There's also issues with some of the math/* tests due to accuracy
issues. I'm not sure if that's down to optimization flags causing bad
results, or if there's issues in the underlying algorithms.

For reference, the math tests with ULP-tolerance errors are: exp10,
floor, fmax, fmin, fmod, hypot, remainder, tan.
The following pass wimpy mode with float, but fail on doubles: floor, log10
Others: pown fails with an "unsupported call to function", lgamma_r
segfaults during kernel compilation, and frexp errors out for doubles
with an instruction-selection error.

I'm hoping that there's an easy single root-cause for most of the
accuracy issues, especially since floor just calls out to the
llvm.floor.[f32|f64|vNf32|vNf64] intrinsic, which you'd assume would
be ok. fmax, fmin, and pow also all seem to have issues with accuracy
when the inputs include nan/-nan, which might be an easy fix.

So yeah, there's plenty of work to do, even in just the math*
functions, outside of dealing with pointer qualifiers breaking things.

--Aaron

> > > > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > > > > > kernel compilation failures using the OpenCL CTS.
> > > > >
> > > > > what are the compilation failures? I thought piglit had these ops
> > > > > covered.
> > > >
> > > > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
> > >
> > > ah right. guess we should add those as well (another time...).
> >
> > Yeah. In theory, we can add those to piglit. Since the CTS was
> > open-sourced, I'm mostly using its failures as a worklist for now just
> > to prevent duplication of work.
> >
> > >
> > > >
> > > > The issue I'm running into with the CTS is that atomic_* works fine,
> > > > but I'm getting an ambiguous function reference for
> > > > atom_inc/atom_dec/atom_or/atom_and.
> > > >
> > > > Example:
> > > >
> > > > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
> > > > __kernel void test_atomic_fn(volatile __global uint *destMemory,
> > > > __global uint *oldValues)
> > > > {
> > > > int tid = get_global_id(0);
> > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > }
> > > >
> > > > Log:
> > > > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
> > > > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
> > > > ------------
> > > > input.cl:6:22: error: call to 'atom_dec' is ambiguous
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
> > > > note: candidate function
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
> > > > note: candidate function
> > > > ------------
> > >
> > > this looks like clang treats both:
> > > "volatile __global uint* -> __global unsigned int*"
> > > and
> > > "volatile __global uint* -> __globale int *"
> > > as equal cost.
> > >
> > > Can you confirm that removing 'volatile' from destMemory hides the
> > > issue?
> >
> > Yes, I can confirm that if I rebuild libclc using the latest upstream
> > revision (and without any of my changes)
> > and then remove the volatile keyword from the atom_* test kernels
> > being compiled that the
> > tests compile and run to completion successfully.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > > > > > ---
> > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > generic/lib/SOURCES | 4 ----
> > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > 9 files changed, 4 insertions(+), 48 deletions(-)
> > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> > > > > >
> > > > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > index bbc872c..a520fe4 100644
> > > > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > @@ -1,2 +1 @@
> > > > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > > > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > > > > > +#define atom_dec(p) atom_sub(p, 1)
> > > > >
> > > > > Does this not cause redefinition warning (since local version uses the
> > > > > same define?
> > > >
> > > > Yes, it looks like it does. When building libclc and running the CTS
> > > > tests, everything looks ok, but if I compile the atom_dec kernel
> > > > manually via clang, I get:
> > > > --------------------------------
> > > > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
> > > > (aka 'volatile __global unsigned int *') to parameter of type
> > > > '__global int *' discards qualifiers
> > > > [-Wincompatible-pointer-types-discards-qualifiers]
> > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > ^~~~~~~~~~~~~~
> > > > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
> > > > note: expanded from macro 'atom_dec'
> > > > #define atom_dec(p) atom_sub(p, 1)
> > > > ^
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
> > > > note: passing argument to parameter 'p' here
> > > > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
> > > > ^
> > > > 1 warning generated.
> > > > --------------------------------
> > > >
> > > > So yeah, I guess this is the wrong approach.
> > > >
> > > > I did just discover while poking around at the kernel that if I remove
> > > > the "volatile" keyword from destMemory, then the kernel builds fine.
> > > > The difference in the atomic_inc and the CL 1.0 atom_inc function
> > > > declarations is basically that volatile keyword. If I re-add the
> > > > volatile keyword to the kernel, and then also add it to the function
> > > > declaration in atom_dec.h, the kernel also builds. Looking at POCL
> > > > and Beignet they both just do a single blanket define in a top-level
> > > > header file along the lines of:
> > > > #define atom_dec atomic_dec
> > > >
> > > > That implies that both of those runtimes declare the global/local
> > > > pointers as volatile even for the CL1.0 variants, which doesn't
> > > > necessarily match the spec (although given how the CL 1.1/1.2 spec are
> > > > written, they basically just say "the CL 1.0 atom_* functions are
> > > > still supported), but it is what the CTS is testing. I don't
> > > > currently have access to the definitions that nv/amd use in their
> > > > closed-source binaries (and I haven't managed to find what ROCm uses,
> > > > but I'm thinking that the implication is that when Khronos
> > > > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
> > > > retroactively adding the 'volatile' keyword to the function argument.
> > > >
> > > > I guess we could just remove the existing defines under
> > > > cl_khr_[global|local]_int32_base_atomics and put a single set of
> > > > defines in either clc.h, or in a new header under something like
> > > > generic/include/clc/atomic/atom_functions.h that just defines the
> > > > renames of the existing functions. Alternatively, we could just go
> > > > back and add the volatile keyword to the pointer arguments in the
> > > > atom_* headers/implementations.
> > > >
> > > > Thoughts?
> > >
> > > I checked that we also use similar define for atomic_inc/dec.
> > > 9040bf38 addressed similar issue when calling atom_add.
> > > I think adding volatile to atom_* functions would just hide the issue
> > > (since test suites use volatile pointers).
> > > see [0]:
> > > "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
> > > ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
> > > types of conversions. Please note: ``U`` lacking qualifiers that are present
> > > on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
> > >
> > > I think a proper fix needs to be done on clang side.
> >
> > Yes, the commit message for that llvm revision you linked sounds
> > exactly like what is going on here.
> >
> > Ugh, looks like I need to read up on llvm development guidelines again
> > (a codebase that I try to limit my exposure to for my own sanity),
> > unless you want to look into that part. :wink:
>
> Clang is not really my thing, and this issue is a rather low priority
> for me. completing atomics support or adding register spilling for EG
> are higher on my list.
>
> You can try contacting Yaxun, he appeared to run into the same issue
> and contributes to clang regularly.

I'll take a look, and might reach out for help, or I'll try to dig
into it and see if I can come up with a solution myself.

For now, like you, this is just one of the minor issues that I've been
running into.

A bigger issue is the regression caused by clang r303370, which I
believe is what causes ALL of the CTS local atomic tests to fail,
along with a few other test suites that expect to be able to
set a local buffer as a kernel argument.

Yaxun is working on this (see the other libclc-dev thread).

There's also issues with some of the math/* tests due to accuracy
issues. I'm not sure if that's down to optimization flags causing bad
results, or if there's issues in the underlying algorithms.

For reference, the math tests with ULP-tolerance errors are: exp10,
floor, fmax, fmin, fmod, hypot, remainder, tan.
The following pass wimpy mode with float, but fail on doubles: floor, log10
Others: pown fails with an "unsupported call to function", lgamma_r
segfaults during kernel compilation, and frexp errors out for doubles
with an instruction-selection error.

I'm surprised that remainder works at all since it's not implemented in
libclc, are you using local patches?

I'm hoping that there's an easy single root-cause for most of the
accuracy issues, especially since floor just calls out to the
llvm.floor.[f32|f64|vNf32|vNf64] intrinsic, which you'd assume would
be ok. fmax, fmin, and pow also all seem to have issues with accuracy
when the inputs include nan/-nan, which might be an easy fix.

I think this might be more tricky, since there is little information
about ULP precision of hw instructions. some operations might need to
be reimplemented in sw (for certain generations of hw). I'm not sure if
libclc or the llvm backend lowering pass is a better place for this.

fmin/fmax are a bit special since they should not change to value if
one side is NaN. I vaguely remember a discussion that the instructions
might still flush denormals in that case.

So yeah, there's plenty of work to do, even in just the math*
functions, outside of dealing with pointer qualifiers breaking things.

It sounds like you work on GCN hw. do you still have/run clover on the
cedar board?

Jan

> > > > > > This is exactly what the atomic_[inc|dec] functions do, and it fixes
> > > > > > kernel compilation failures using the OpenCL CTS.
> > > > >
> > > > > what are the compilation failures? I thought piglit had these ops
> > > > > covered.
> > > >
> > > > Piglit has atomic_* covered, but not the CL 1.0 atom_* functions.
> > >
> > > ah right. guess we should add those as well (another time...).
> >
> > Yeah. In theory, we can add those to piglit. Since the CTS was
> > open-sourced, I'm mostly using its failures as a worklist for now just
> > to prevent duplication of work.
> >
> > >
> > > >
> > > > The issue I'm running into with the CTS is that atomic_* works fine,
> > > > but I'm getting an ambiguous function reference for
> > > > atom_inc/atom_dec/atom_or/atom_and.
> > > >
> > > > Example:
> > > >
> > > > #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
> > > > __kernel void test_atomic_fn(volatile __global uint *destMemory,
> > > > __global uint *oldValues)
> > > > {
> > > > int tid = get_global_id(0);
> > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > }
> > > >
> > > > Log:
> > > > Build not successful for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)", status: CL_BUILD_ERROR
> > > > Build log for device "AMD PITCAIRN (DRM 2.49.0 /
> > > > 4.11.0-041100rc8-generic, LLVM 5.0.0)" is:
> > > > ------------
> > > > input.cl:6:22: error: call to 'atom_dec' is ambiguous
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:1:29:
> > > > note: candidate function
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h:2:38:
> > > > note: candidate function
> > > > ------------
> > >
> > > this looks like clang treats both:
> > > "volatile __global uint* -> __global unsigned int*"
> > > and
> > > "volatile __global uint* -> __globale int *"
> > > as equal cost.
> > >
> > > Can you confirm that removing 'volatile' from destMemory hides the
> > > issue?
> >
> > Yes, I can confirm that if I rebuild libclc using the latest upstream
> > revision (and without any of my changes)
> > and then remove the volatile keyword from the atom_* test kernels
> > being compiled that the
> > tests compile and run to completion successfully.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Aaron Watry <awatry@gmail.com>
> > > > > > ---
> > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > generic/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h | 3 +--
> > > > > > generic/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h | 3 +--
> > > > > > generic/lib/SOURCES | 4 ----
> > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl | 9 ---------
> > > > > > generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl | 9 ---------
> > > > > > 9 files changed, 4 insertions(+), 48 deletions(-)
> > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_dec.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_global_int32_base_atomics/atom_inc.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_dec.cl
> > > > > > delete mode 100644 generic/lib/cl_khr_local_int32_base_atomics/atom_inc.cl
> > > > > >
> > > > > > diff --git a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > index bbc872c..a520fe4 100644
> > > > > > --- a/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > +++ b/generic/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
> > > > > > @@ -1,2 +1 @@
> > > > > > -_CLC_OVERLOAD _CLC_DECL int atom_dec(global int *p);
> > > > > > -_CLC_OVERLOAD _CLC_DECL unsigned int atom_dec(global unsigned int *p);
> > > > > > +#define atom_dec(p) atom_sub(p, 1)
> > > > >
> > > > > Does this not cause redefinition warning (since local version uses the
> > > > > same define?
> > > >
> > > > Yes, it looks like it does. When building libclc and running the CTS
> > > > tests, everything looks ok, but if I compile the atom_dec kernel
> > > > manually via clang, I get:
> > > > --------------------------------
> > > > atom_dec_uint.cl:5:32: warning: passing 'volatile __global uint *'
> > > > (aka 'volatile __global unsigned int *') to parameter of type
> > > > '__global int *' discards qualifiers
> > > > [-Wincompatible-pointer-types-discards-qualifiers]
> > > > oldValues[tid] = atom_dec( &destMemory[0] );
> > > > ^~~~~~~~~~~~~~
> > > > /usr/local/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h:9:30:
> > > > note: expanded from macro 'atom_dec'
> > > > #define atom_dec(p) atom_sub(p, 1)
> > > > ^
> > > > /usr/local/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h:10:50:
> > > > note: passing argument to parameter 'p' here
> > > > _CLC_OVERLOAD _CLC_DECL int atom_sub(global int *p, int val);
> > > > ^
> > > > 1 warning generated.
> > > > --------------------------------
> > > >
> > > > So yeah, I guess this is the wrong approach.
> > > >
> > > > I did just discover while poking around at the kernel that if I remove
> > > > the "volatile" keyword from destMemory, then the kernel builds fine.
> > > > The difference in the atomic_inc and the CL 1.0 atom_inc function
> > > > declarations is basically that volatile keyword. If I re-add the
> > > > volatile keyword to the kernel, and then also add it to the function
> > > > declaration in atom_dec.h, the kernel also builds. Looking at POCL
> > > > and Beignet they both just do a single blanket define in a top-level
> > > > header file along the lines of:
> > > > #define atom_dec atomic_dec
> > > >
> > > > That implies that both of those runtimes declare the global/local
> > > > pointers as volatile even for the CL1.0 variants, which doesn't
> > > > necessarily match the spec (although given how the CL 1.1/1.2 spec are
> > > > written, they basically just say "the CL 1.0 atom_* functions are
> > > > still supported), but it is what the CTS is testing. I don't
> > > > currently have access to the definitions that nv/amd use in their
> > > > closed-source binaries (and I haven't managed to find what ROCm uses,
> > > > but I'm thinking that the implication is that when Khronos
> > > > renamed/aliased atom_* to atomic_* in CL 1.1, they were implicitly
> > > > retroactively adding the 'volatile' keyword to the function argument.
> > > >
> > > > I guess we could just remove the existing defines under
> > > > cl_khr_[global|local]_int32_base_atomics and put a single set of
> > > > defines in either clc.h, or in a new header under something like
> > > > generic/include/clc/atomic/atom_functions.h that just defines the
> > > > renames of the existing functions. Alternatively, we could just go
> > > > back and add the volatile keyword to the pointer arguments in the
> > > > atom_* headers/implementations.
> > > >
> > > > Thoughts?
> > >
> > > I checked that we also use similar define for atomic_inc/dec.
> > > 9040bf38 addressed similar issue when calling atom_add.
> > > I think adding volatile to atom_* functions would just hide the issue
> > > (since test suites use volatile pointers).
> > > see [0]:
> > > "A conversion from a pointer of type ``T*`` to a pointer of type ``U*``, where
> > > ``T`` and ``U`` are incompatible, is allowed, but is ranked below all other
> > > types of conversions. Please note: ``U`` lacking qualifiers that are present
> > > on ``T`` is sufficient for ``T`` and ``U`` to be incompatible."
> > >
> > > I think a proper fix needs to be done on clang side.
> >
> > Yes, the commit message for that llvm revision you linked sounds
> > exactly like what is going on here.
> >
> > Ugh, looks like I need to read up on llvm development guidelines again
> > (a codebase that I try to limit my exposure to for my own sanity),
> > unless you want to look into that part. :wink:
>
> Clang is not really my thing, and this issue is a rather low priority
> for me. completing atomics support or adding register spilling for EG
> are higher on my list.
>
> You can try contacting Yaxun, he appeared to run into the same issue
> and contributes to clang regularly.

I'll take a look, and might reach out for help, or I'll try to dig
into it and see if I can come up with a solution myself.

For now, like you, this is just one of the minor issues that I've been
running into.

A bigger issue is the regression caused by clang r303370, which I
believe is what causes ALL of the CTS local atomic tests to fail,
along with a few other test suites that expect to be able to
set a local buffer as a kernel argument.

Yaxun is working on this (see the other libclc-dev thread).

Yeah, I saw that. If I have some time this weekend (or on the 4th), I'll try to
see if there's something that we can do in clover, if it's not resolved by then.

There's also issues with some of the math/* tests due to accuracy
issues. I'm not sure if that's down to optimization flags causing bad
results, or if there's issues in the underlying algorithms.

For reference, the math tests with ULP-tolerance errors are: exp10,
floor, fmax, fmin, fmod, hypot, remainder, tan.
The following pass wimpy mode with float, but fail on doubles: floor, log10
Others: pown fails with an "unsupported call to function", lgamma_r
segfaults during kernel compilation, and frexp errors out for doubles
with an instruction-selection error.

I'm surprised that remainder works at all since it's not implemented in
libclc, are you using local patches?

Yeah, I've got a local patch for that one that was submitted to libclc
back in January, but it never got committed due to accuracy issues.

http://lists.llvm.org/pipermail/libclc-dev/2017-January/002341.html

I'm hoping that there's an easy single root-cause for most of the
accuracy issues, especially since floor just calls out to the
llvm.floor.[f32|f64|vNf32|vNf64] intrinsic, which you'd assume would
be ok. fmax, fmin, and pow also all seem to have issues with accuracy
when the inputs include nan/-nan, which might be an easy fix.

I think this might be more tricky, since there is little information
about ULP precision of hw instructions. some operations might need to
be reimplemented in sw (for certain generations of hw). I'm not sure if
libclc or the llvm backend lowering pass is a better place for this.

fmin/fmax are a bit special since they should not change to value if
one side is NaN. I vaguely remember a discussion that the instructions
might still flush denormals in that case.

Possible. I haven't looked into the failures yet.

So yeah, there's plenty of work to do, even in just the math*
functions, outside of dealing with pointer qualifiers breaking things.

It sounds like you work on GCN hw. do you still have/run clover on the
cedar board?

The CEDAR is gone (old work machine that was retired).

The cards that I currently have available and installed are:
Radeon 7850 (PITCAIRN, GCN 1.0)
Radeon 6850 (BARTS, Northern Islands)
Radeon 6530D (SUMO, Llano 3-core APU)
Intel HD 4600 (Haswell IGP w/ Beignet, i7-4810MQ)

I do have a Radeon 5400 PCI (non-express) CEDAR card installed in an
old Alpha Personal WorkStation, but that machine doesn't have a
workable OS at the moment.

I've also got a Kepler-level GeForce 760 sitting on the desk, but I
haven't found a home for it yet.

--Aaron