[PATCH 1/2] shared: Implement aligned vector loads (vloada_half)

Passes newly posted piglits on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Float version pass newly posted piglit tests on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

I haven’t forgotten about these two… I’m just trying to figure out some ambiguity in both the 1.2/2.0/2.2 spec related to whether a scalar version (vec-size 1 with no numeric suffix) is needed. The CTS tests for vloada_half being supported, while the spec’s language changes a bit between versions, and never gets to what I’d call a consistent state.

–Aaron

I haven't forgotten about these two... I'm just trying to figure out some
ambiguity in both the 1.2/2.0/2.2 spec related to whether a scalar version
(vec-size 1 with no numeric suffix) is needed. The CTS tests for
vloada_half being supported, while the spec's language changes a bit
between versions, and never gets to what I'd call a consistent state.

yeah, it was weird that ctx expects a scalar version. since non-aligned
vload_half/vstore_half expect the pointer to be 16bit aligned, scalar
vloada_half/vstorea_half would be identical to non-aligned version.
The specs seem to always mention the aligned variants with 'n' suffix,
so I just considered it a CTS bug.

I don't mind going out of specs and adding a scalar version if you
think it's useful beyond appeasing the CTS.

Jan

> I haven't forgotten about these two... I'm just trying to figure out some
> ambiguity in both the 1.2/2.0/2.2 spec related to whether a scalar version
> (vec-size 1 with no numeric suffix) is needed. The CTS tests for
> vloada_half being supported, while the spec's language changes a bit
> between versions, and never gets to what I'd call a consistent state.

yeah, it was weird that ctx expects a scalar version. since non-aligned
vload_half/vstore_half expect the pointer to be 16bit aligned, scalar
vloada_half/vstorea_half would be identical to non-aligned version.
The specs seem to always mention the aligned variants with 'n' suffix,
so I just considered it a CTS bug.

I don't mind going out of specs and adding a scalar version if you
think it's useful beyond appeasing the CTS.

Hi,

I'm not sure what the final consensus is here. Do you prefer I added
scalar versions of vloada_half/vstorea_half even though they are
identical to non-aligned versions?

I could not find any support for it in the specs, only the CTS expects
it.

Jan

I haven’t forgotten about these two… I’m just trying to figure out some
ambiguity in both the 1.2/2.0/2.2 spec related to whether a scalar version
(vec-size 1 with no numeric suffix) is needed. The CTS tests for
vloada_half being supported, while the spec’s language changes a bit
between versions, and never gets to what I’d call a consistent state.

yeah, it was weird that ctx expects a scalar version. since non-aligned
vload_half/vstore_half expect the pointer to be 16bit aligned, scalar
vloada_half/vstorea_half would be identical to non-aligned version.
The specs seem to always mention the aligned variants with ‘n’ suffix,
so I just considered it a CTS bug.

I don’t mind going out of specs and adding a scalar version if you
think it’s useful beyond appeasing the CTS.

Hi,

I’m not sure what the final consensus is here. Do you prefer I added
scalar versions of vloada_half/vstorea_half even though they are
identical to non-aligned versions?

I could not find any support for it in the specs, only the CTS expects
it.

I’d say at this point that all conformant implementations already probably support it unless they’ve all gotten a waiver due to it being a possible spec bug.

Since it’s basically just an alias for vload_half/vstore_half, I wouldn’t be against adding it in for now. Maybe eventually we can get some clarification or a test fix put in.

Aaron

> > > I haven't forgotten about these two... I'm just trying to figure out
>
> some
> > > ambiguity in both the 1.2/2.0/2.2 spec related to whether a scalar
>
> version
> > > (vec-size 1 with no numeric suffix) is needed. The CTS tests for
> > > vloada_half being supported, while the spec's language changes a bit
> > > between versions, and never gets to what I'd call a consistent state.
> >
> > yeah, it was weird that ctx expects a scalar version. since non-aligned
> > vload_half/vstore_half expect the pointer to be 16bit aligned, scalar
> > vloada_half/vstorea_half would be identical to non-aligned version.
> > The specs seem to always mention the aligned variants with 'n' suffix,
> > so I just considered it a CTS bug.
> >
> > I don't mind going out of specs and adding a scalar version if you
> > think it's useful beyond appeasing the CTS.
>
> Hi,
>
> I'm not sure what the final consensus is here. Do you prefer I added
> scalar versions of vloada_half/vstorea_half even though they are
> identical to non-aligned versions?
>
> I could not find any support for it in the specs, only the CTS expects
> it.
>

I'd say at this point that all conformant implementations already probably
support it unless they've all gotten a waiver due to it being a possible
spec bug.

that should be easy to test. I can check beignet and nvidia CUDA. I'm
travelling until Thursday, so I probably won't have time until then.

Since it's basically just an alias for vload_half/vstore_half, I wouldn't
be against adding it in for now. Maybe eventually we can get some
clarification or a test fix put in.

Sure. I'll see what I find out about other and send a resping if
necessary.

Jan

Passes newly posted piglits on turks and carrizo
v2: add scalar vloada_half

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Float version passes newly posted piglit tests on turks, float and double pass on carrizo.
v2: scalar vstorea_half

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Passes newly posted piglits on turks and carrizo
v2: add scalar vloada_half

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

generic/include/clc/shared/vload.h | 40 +++++++++++++++++++++±---------------
generic/lib/shared/vload.cl | 10 +++++++±-
generic/lib/shared/vload_half.inc | 26 ++++++++++++++++±-------
3 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/generic/include/clc/shared/vload.h b/generic/include/clc/shared/vload.h
index 8c262dd…07a8700 100644
— a/generic/include/clc/shared/vload.h
+++ b/generic/include/clc/shared/vload.h
@@ -12,22 +12,24 @@
_CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __private)
_CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __local)
_CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __constant) \

  • _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __global) \
  • _CLC_VECTOR_VLOAD_DECL(SUFFIX, MEM_TYPE, PRIM_TYPE, __global)

#define _CLC_VECTOR_VLOAD_PRIM1(PRIM_TYPE) \

  • _CLC_VECTOR_VLOAD_PRIM3(, PRIM_TYPE, PRIM_TYPE) \

-#define _CLC_VECTOR_VLOAD_PRIM() \

  • _CLC_VECTOR_VLOAD_PRIM1(char) \
  • _CLC_VECTOR_VLOAD_PRIM1(uchar) \
  • _CLC_VECTOR_VLOAD_PRIM1(short) \
  • _CLC_VECTOR_VLOAD_PRIM1(ushort) \
  • _CLC_VECTOR_VLOAD_PRIM1(int) \
  • _CLC_VECTOR_VLOAD_PRIM1(uint) \
  • _CLC_VECTOR_VLOAD_PRIM1(long) \
  • _CLC_VECTOR_VLOAD_PRIM1(ulong) \
  • _CLC_VECTOR_VLOAD_PRIM1(float) \
  • _CLC_VECTOR_VLOAD_PRIM3(_half, half, float)
  • _CLC_VECTOR_VLOAD_PRIM3(, PRIM_TYPE, PRIM_TYPE)

+// Declare vector load prototypes
+_CLC_VECTOR_VLOAD_PRIM1(char)
+_CLC_VECTOR_VLOAD_PRIM1(uchar)
+_CLC_VECTOR_VLOAD_PRIM1(short)
+_CLC_VECTOR_VLOAD_PRIM1(ushort)
+_CLC_VECTOR_VLOAD_PRIM1(int)
+_CLC_VECTOR_VLOAD_PRIM1(uint)
+_CLC_VECTOR_VLOAD_PRIM1(long)
+_CLC_VECTOR_VLOAD_PRIM1(ulong)
+_CLC_VECTOR_VLOAD_PRIM1(float)
+_CLC_VECTOR_VLOAD_PRIM3(_half, half, float)
+// Use suffix to declare aligned vloada_halfN
+_CLC_VECTOR_VLOAD_PRIM3(a_half, half, float)

#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64: enable
@@ -38,15 +40,19 @@
_CLC_VECTOR_VLOAD_PRIM1(half)
#endif

-_CLC_VECTOR_VLOAD_PRIM()
-// Plain vload_half also needs to be declared
+// Scalar vload_half also needs to be declared
_CLC_VLOAD_DECL(_half, half, float, , __constant)
_CLC_VLOAD_DECL(_half, half, float, , __global)
_CLC_VLOAD_DECL(_half, half, float, , __local)
_CLC_VLOAD_DECL(_half, half, float, , __private)

+// Scalar vloada_half is ont part of the specs but CTS expects it

“ont”

Looks fine to me, otherwise.

–Aaron

Float version passes newly posted piglit tests on turks, float and double pass on carrizo.
v2: scalar vstorea_half

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

generic/include/clc/shared/vstore.h | 41 ++++++++++++++++++++++++±-----------
generic/lib/shared/vstore.cl | 30 +++++++++++++±------------
generic/lib/shared/vstore_half.inc | 21 ++++++++++++±-----
3 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/generic/include/clc/shared/vstore.h b/generic/include/clc/shared/vstore.h
index 0e3f694…17a0d29 100644
— a/generic/include/clc/shared/vstore.h
+++ b/generic/include/clc/shared/vstore.h
@@ -16,37 +16,52 @@
#define _CLC_VECTOR_VSTORE_PRIM1(PRIM_TYPE)
_CLC_VECTOR_VSTORE_PRIM3(,PRIM_TYPE, PRIM_TYPE) \

-#define _CLC_VECTOR_VSTORE_PRIM() \

  • _CLC_VECTOR_VSTORE_PRIM1(char) \
  • _CLC_VECTOR_VSTORE_PRIM1(uchar) \
  • _CLC_VECTOR_VSTORE_PRIM1(short) \
  • _CLC_VECTOR_VSTORE_PRIM1(ushort) \
  • _CLC_VECTOR_VSTORE_PRIM1(int) \
  • _CLC_VECTOR_VSTORE_PRIM1(uint) \
  • _CLC_VECTOR_VSTORE_PRIM1(long) \
  • _CLC_VECTOR_VSTORE_PRIM1(ulong) \
  • _CLC_VECTOR_VSTORE_PRIM1(float) \
  • _CLC_VECTOR_VSTORE_PRIM3(_half, half, float)
    +_CLC_VECTOR_VSTORE_PRIM1(char)
    +_CLC_VECTOR_VSTORE_PRIM1(uchar)
    +_CLC_VECTOR_VSTORE_PRIM1(short)
    +_CLC_VECTOR_VSTORE_PRIM1(ushort)
    +_CLC_VECTOR_VSTORE_PRIM1(int)
    +_CLC_VECTOR_VSTORE_PRIM1(uint)
    +_CLC_VECTOR_VSTORE_PRIM1(long)
    +_CLC_VECTOR_VSTORE_PRIM1(ulong)
    +_CLC_VECTOR_VSTORE_PRIM1(float)
    +_CLC_VECTOR_VSTORE_PRIM3(_half, half, float)
    +// Use suffix to declare aligned vstorea_halfN
    +_CLC_VECTOR_VSTORE_PRIM3(a_half, half, float)

#ifdef cl_khr_fp64
_CLC_VECTOR_VSTORE_PRIM1(double)
_CLC_VECTOR_VSTORE_PRIM3(_half, half, double)

  • // Use suffix to declare aligned vstorea_halfN
  • _CLC_VECTOR_VSTORE_PRIM3(a_half, half, double)
  • // Scalar vstore_half also needs to be declared
    _CLC_VSTORE_DECL(_half, half, double, , __private)
    _CLC_VSTORE_DECL(_half, half, double, , __local)
    _CLC_VSTORE_DECL(_half, half, double, , __global)
  • // Scalar vstorea_half is ont part of the specs but CTS expects it

“ont” again

  • _CLC_VSTORE_DECL(a_half, half, double, , __private)
  • _CLC_VSTORE_DECL(a_half, half, double, , __local)
  • _CLC_VSTORE_DECL(a_half, half, double, , __global)
    #endif

#ifdef cl_khr_fp16
_CLC_VECTOR_VSTORE_PRIM1(half)
#endif

-_CLC_VECTOR_VSTORE_PRIM()
+// Scalar vstore_half also needs to be declared
_CLC_VSTORE_DECL(_half, half, float, , __private)
_CLC_VSTORE_DECL(_half, half, float, , __local)
_CLC_VSTORE_DECL(_half, half, float, , __global)

+// Scalar vstorea_half is ont part of the specs but CTS expects it

“ont” here as well

otherwise looks fine to me.

–Aaron