[PATCH v3 1/2] AMDGPU: Use clang intrinsics for workitem builtins

v2: split into 2 patches
    use clang builtins for other intrinsics as well

v3: Fix warnings
    Switch r600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

ping

Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---
amdgcn/lib/SOURCES | 1 +
amdgcn/lib/workitem/get_global_offset.cl | 11 +++++++++++
generic/include/clc/clc.h | 1 +
generic/include/clc/workitem/get_global_offset.h | 2 ++
generic/lib/workitem/get_global_id.cl | 2 +-
ptx-nvidiacl/lib/SOURCES | 1 +
ptx-nvidiacl/lib/workitem/get_global_id.cl | 5 +++++
r600/lib/SOURCES | 1 +
r600/lib/workitem/get_global_offset.cl | 11 +++++++++++
9 files changed, 34 insertions(+), 1 deletion(-)
create mode 100644 amdgcn/lib/workitem/get_global_offset.cl
create mode 100644 generic/include/clc/workitem/get_global_offset.h
create mode 100644 ptx-nvidiacl/lib/workitem/get_global_id.cl
create mode 100644 r600/lib/workitem/get_global_offset.cl

diff --git a/amdgcn/lib/SOURCES b/amdgcn/lib/SOURCES
index 4178d70..33038f2 100644
--- a/amdgcn/lib/SOURCES
+++ b/amdgcn/lib/SOURCES
@@ -1,4 +1,5 @@
synchronization/barrier_impl.ll
+workitem/get_global_offset.cl
workitem/get_group_id.cl
workitem/get_local_id.cl
workitem/get_work_dim.cl
diff --git a/amdgcn/lib/workitem/get_global_offset.cl b/amdgcn/lib/workitem/get_global_offset.cl
new file mode 100644
index 0000000..32aaa4c
--- /dev/null
+++ b/amdgcn/lib/workitem/get_global_offset.cl
@@ -0,0 +1,11 @@
+#include <clc/clc.h>
+
+_CLC_DEF uint get_global_offset(uint dim)
+{
+ __attribute__((address_space(2))) uint * ptr =
+ (__attribute__((address_space(2))) uint *)
+ __builtin_amdgcn_implicitarg_ptr();

Why did you use __attribute__((address_space(2))) instead of 'constant'?

>
> Also fix get_global_id to consider offset
> No idea how to add this for ptx, so they are stuck with the old
> get_global_id
> implementation.
>
> v2: split to a separate patch
>
> v3: Switch R600 to use implictarg.ptr
>
> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
> ---
> amdgcn/lib/SOURCES | 1 +
> amdgcn/lib/workitem/get_global_offset.cl | 11 +++++++++++
> generic/include/clc/clc.h | 1 +
> generic/include/clc/workitem/get_global_offset.h | 2 ++
> generic/lib/workitem/get_global_id.cl | 2 +-
> ptx-nvidiacl/lib/SOURCES | 1 +
> ptx-nvidiacl/lib/workitem/get_global_id.cl | 5 +++++
> r600/lib/SOURCES | 1 +
> r600/lib/workitem/get_global_offset.cl | 11 +++++++++++
> 9 files changed, 34 insertions(+), 1 deletion(-)
> create mode 100644 amdgcn/lib/workitem/get_global_offset.cl
> create mode 100644
> generic/include/clc/workitem/get_global_offset.h
> create mode 100644 ptx-nvidiacl/lib/workitem/get_global_id.cl
> create mode 100644 r600/lib/workitem/get_global_offset.cl
>
> diff --git a/amdgcn/lib/SOURCES b/amdgcn/lib/SOURCES
> index 4178d70..33038f2 100644
> --- a/amdgcn/lib/SOURCES
> +++ b/amdgcn/lib/SOURCES
> @@ -1,4 +1,5 @@
> synchronization/barrier_impl.ll
> +workitem/get_global_offset.cl
> workitem/get_group_id.cl
> workitem/get_local_id.cl
> workitem/get_work_dim.cl
> diff --git a/amdgcn/lib/workitem/get_global_offset.cl
> b/amdgcn/lib/workitem/get_global_offset.cl
> new file mode 100644
> index 0000000..32aaa4c
> --- /dev/null
> +++ b/amdgcn/lib/workitem/get_global_offset.cl
> @@ -0,0 +1,11 @@
> +#include <clc/clc.h>
> +
> +_CLC_DEF uint get_global_offset(uint dim)
> +{
> + __attribute__((address_space(2))) uint * ptr =
> + (__attribute__((address_space(2))) uint *)
> + __builtin_amdgcn_implicitarg_ptr();

Why did you use __attribute__((address_space(2))) instead of
'constant'?

If I use __constant uint *, clang complains that the cast changes AS
(guess it does not know that AS 2 is constant AS at this point).
It's also more consistent with R600 that uses AS 7.

Jan

> >
> > Also fix get_global_id to consider offset
> > No idea how to add this for ptx, so they are stuck with the old
> > get_global_id
> > implementation.
> >
> > v2: split to a separate patch
> >
> > v3: Switch R600 to use implictarg.ptr
> >
> > Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
> > ---
> > amdgcn/lib/SOURCES | 1 +
> > amdgcn/lib/workitem/get_global_offset.cl | 11 +++++++++++
> > generic/include/clc/clc.h | 1 +
> > generic/include/clc/workitem/get_global_offset.h | 2 ++
> > generic/lib/workitem/get_global_id.cl | 2 +-
> > ptx-nvidiacl/lib/SOURCES | 1 +
> > ptx-nvidiacl/lib/workitem/get_global_id.cl | 5 +++++
> > r600/lib/SOURCES | 1 +
> > r600/lib/workitem/get_global_offset.cl | 11 +++++++++++
> > 9 files changed, 34 insertions(+), 1 deletion(-)
> > create mode 100644 amdgcn/lib/workitem/get_global_offset.cl
> > create mode 100644
> > generic/include/clc/workitem/get_global_offset.h
> > create mode 100644 ptx-nvidiacl/lib/workitem/get_global_id.cl
> > create mode 100644 r600/lib/workitem/get_global_offset.cl
> >
> > diff --git a/amdgcn/lib/SOURCES b/amdgcn/lib/SOURCES
> > index 4178d70..33038f2 100644
> > --- a/amdgcn/lib/SOURCES
> > +++ b/amdgcn/lib/SOURCES
> > @@ -1,4 +1,5 @@
> > synchronization/barrier_impl.ll
> > +workitem/get_global_offset.cl
> > workitem/get_group_id.cl
> > workitem/get_local_id.cl
> > workitem/get_work_dim.cl
> > diff --git a/amdgcn/lib/workitem/get_global_offset.cl
> > b/amdgcn/lib/workitem/get_global_offset.cl
> > new file mode 100644
> > index 0000000..32aaa4c
> > --- /dev/null
> > +++ b/amdgcn/lib/workitem/get_global_offset.cl
> > @@ -0,0 +1,11 @@
> > +#include <clc/clc.h>
> > +
> > +_CLC_DEF uint get_global_offset(uint dim)
> > +{
> > + __attribute__((address_space(2))) uint * ptr =
> > + (__attribute__((address_space(2))) uint *)
> > + __builtin_amdgcn_implicitarg_ptr();
>
> Why did you use __attribute__((address_space(2))) instead of
> 'constant'?

If I use __constant uint *, clang complains that the cast changes AS
(guess it does not know that AS 2 is constant AS at this point).
It's also more consistent with R600 that uses AS 7.

Ok, that makes sense. LGTM.

-Tom