[PATCH 1/3] Add halfN types and enable fp16 when generating builtin declarations

Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>

This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>

This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>

This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Cc: Jan Vesely <jan.vesely@rutgers.edu>
---
generic/include/clc/clc.h | 3 +
generic/include/clc/misc/shuffle.h | 47 +++++++++++
generic/lib/SOURCES | 1 +
generic/lib/misc/shuffle.cl | 157 +++++++++++++++++++++++++++++++++++++
4 files changed, 208 insertions(+)
create mode 100644 generic/include/clc/misc/shuffle.h
create mode 100644 generic/lib/misc/shuffle.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 059cb7f..9be5ff2 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -237,6 +237,9 @@

/* 6.11.13 Image Read and Write Functions */

+/* 6.12.12 Miscellaneous Vector Functions */
+#include <clc/misc/shuffle.h>
+
#include <clc/image/image_defines.h>
#include <clc/image/image.h>

diff --git a/generic/include/clc/misc/shuffle.h b/generic/include/clc/misc/shuffle.h
new file mode 100644
index 0000000..393a19b
--- /dev/null
+++ b/generic/include/clc/misc/shuffle.h
@@ -0,0 +1,47 @@
+//===-- generic/include/clc/misc/shuffle.h ------------------------------===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under both the University of Illinois Open Source
+// License and the MIT license. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+
+#define _CLC_SHUFFLE_DECL(TYPE, MASKTYPE, RETTYPE) \
+ _CLC_OVERLOAD _CLC_DECL RETTYPE shuffle(TYPE x, MASKTYPE mask);
+
+//Return type is same base type as the input type, with the same vector size as the mask.
+//Elements in the mask must be the same size (number of bits) as the input value.
+//E.g. char8 ret = shuffle(char2 x, uchar8 mask);
+
+#define _CLC_VECTOR_SHUFFLE_MASKSIZE(INBASE, INTYPE, MASKTYPE) \
+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##2, INBASE##2) \
+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##4, INBASE##4) \
+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##8, INBASE##8) \
+ _CLC_SHUFFLE_DECL(INTYPE, MASKTYPE##16, INBASE##16) \
+
+#define _CLC_VECTOR_SHUFFLE_INSIZE(TYPE, MASKTYPE) \
+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##2, MASKTYPE) \
+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##4, MASKTYPE) \
+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##8, MASKTYPE) \
+ _CLC_VECTOR_SHUFFLE_MASKSIZE(TYPE, TYPE##16, MASKTYPE) \
+
+_CLC_VECTOR_SHUFFLE_INSIZE(char, uchar)
+_CLC_VECTOR_SHUFFLE_INSIZE(short, ushort)
+_CLC_VECTOR_SHUFFLE_INSIZE(int, uint)
+_CLC_VECTOR_SHUFFLE_INSIZE(long, ulong)
+_CLC_VECTOR_SHUFFLE_INSIZE(uchar, uchar)
+_CLC_VECTOR_SHUFFLE_INSIZE(ushort, ushort)
+_CLC_VECTOR_SHUFFLE_INSIZE(uint, uint)
+_CLC_VECTOR_SHUFFLE_INSIZE(ulong, ulong)
+_CLC_VECTOR_SHUFFLE_INSIZE(float, uint)
+#ifdef cl_khr_fp64
+_CLC_VECTOR_SHUFFLE_INSIZE(double, ulong)
+#endif
+#ifdef cl_khr_fp16
+_CLC_VECTOR_SHUFFLE_INSIZE(half, ushort)
+#endif
+
+#undef _CLC_SHUFFLE_DECL
+#undef _CLC_VECTOR_SHUFFLE_MASKSIZE
+#undef _CLC_VECTOR_SHUFFLE_INSIZE
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 9e0157b..fe0df5a 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -139,6 +139,7 @@ relational/isnormal.cl
relational/isnotequal.cl
relational/isordered.cl
relational/isunordered.cl
+relational/shuffle.cl

This should be misc/shuffle.cl.

Fixed locally. Same with the shuffle2 patch.

Sorry for the noise.

--Aaron

This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>
Cc: Jan Vesely <jan.vesely@rutgers.edu>

with the SOURCE lines fixed:
For the series
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>

half tests needed a bit more work (removing the fp16 piglit check,
adding vload/vstore(half)), but they pass on my carrizo/iceland
I'll post the vload/vstore patches soon

thanks,
Jan