[PATCH 01/15] Fix implementation of normalize builtin

The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

v2:
  - Alphabettize SOURCES

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove f suffix from constant in double implementations.
  - Consolidate implementations using the .cl/.inc approach.

v3:
- Use __CLC_FPSIZE instead of __CLC_FP{32,64}

This makes it possible for runtime implementations to disable
subnormal handling at runtime.

When this flag is enabled, decisions about how to handle subnormals
in the library will be controlled by an external variable called
__CLC_SUBNORMAL_DISABLE.

Function implementations should use these new helpers for querying subnormal
support:
__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

In order for the library to link correctly with this feature,
users will be required to either:

1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs).

2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the
linker. These files are distributed with liblclc and installed to
$(installdir). e.g.:

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc

or

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc

If you do not supply the --enable-runtime-subnormal then the library
behaves the same as it did before this commit.

In addition to these changes, the patch adds helper functions that
should be used when implementing library functions that need
special handling for denormals:

__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

Using exp2(x * M_LOG2E_F) does not give us accurate enough results for
OpenCL. If you look at the new exp implementation you'll see that
it does multiply the input by M_LOG2E_F, but it still uses the original
input in part of the calculation.

This exp implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This patch breaks ldexp on r600 (it works OK with only 11/15).
Manual mentions only LDEXP_64 (for r600,EG, and NI).

From: Aaron Watry <awatry@gmail.com>

Signed-off-by: Aaron Watry <awatry@gmail.com>

LGTM

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.
---
generic/lib/SOURCES | 1 +
generic/lib/clcmacro.h | 4 ++
generic/lib/math/ldexp.cl | 138 +++++++++++++++++++++++++++++++++++++++++++++
generic/lib/math/ldexp.inc | 29 ++++++++++
r600/lib/math/ldexp.cl | 2 +-
r600/lib/math/ldexp.inc | 29 ----------
6 files changed, 173 insertions(+), 30 deletions(-)
create mode 100644 generic/lib/math/ldexp.cl
create mode 100644 generic/lib/math/ldexp.inc
delete mode 100644 r600/lib/math/ldexp.inc

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index e047540..0e8c7d9 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -87,6 +87,7 @@ math/fract.cl
math/half_rsqrt.cl
math/half_sqrt.cl
math/hypot.cl
+math/ldexp.cl
math/log10.cl
math/log1p.cl
math/mad.cl
diff --git a/generic/lib/clcmacro.h b/generic/lib/clcmacro.h
index 346adf2..9ef337b 100644
--- a/generic/lib/clcmacro.h
+++ b/generic/lib/clcmacro.h
@@ -115,6 +115,10 @@ _CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x, ARG2_TYPE y) { \
} \
_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE)

+#define _CLC_DEFINE_BINARY_BUILTIN_WITH_SCALAR_SECOND_ARG(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \
+_CLC_DEFINE_BINARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE, ARG2_TYPE) \
+_CLC_BINARY_VECTORIZE_SCALAR_SECOND_ARG(_CLC_OVERLOAD _CLC_DEF, RET_TYPE, FUNCTION, ARG1_TYPE, ARG2_TYPE)
+
#define _CLC_DEFINE_UNARY_BUILTIN(RET_TYPE, FUNCTION, BUILTIN, ARG1_TYPE) \
_CLC_DEF _CLC_OVERLOAD RET_TYPE FUNCTION(ARG1_TYPE x) { \
   return BUILTIN(x); \
diff --git a/generic/lib/math/ldexp.cl b/generic/lib/math/ldexp.cl
new file mode 100644
index 0000000..1802bf3
--- /dev/null
+++ b/generic/lib/math/ldexp.cl
@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+#include "config.h"
+#include "../clcmacro.h"
+#include "math.h"
+
+_CLC_DEF _CLC_OVERLOAD float ldexp(float x, int n) {
+
+ if (!__clc_fp32_subnormals_supported()) {
+
+ // This treats subnormals as zeros
+ int i = as_int(x);
+ int e = (i >> 23) & 0xff;
+ int m = i & 0x007fffff;
+ int s = i & 0x80000000;
+ int v = add_sat(e, n);
+ v = clamp(v, 0, 0xff);
+ int mr = e == 0 | v == 0 | v == 0xff ? 0 : m;
+ int c = e == 0xff;
+ mr = c ? m : mr;
+ int er = c ? e : v;
+ er = e ? er : e;
+ return as_float( s | (er << 23) | mr );
+ }
+
+ /* supports denormal values */
+ const int multiplier = 24;
+ float val_f;
+ uint val_ui;
+ uint sign;
+ int exponent;
+ val_ui = as_uint(x);
+ sign = val_ui & 0x80000000;
+ val_ui = val_ui & 0x7fffffff;/* remove the sign bit */
+ int val_x = val_ui;
+
+ exponent = val_ui >> 23; /* get the exponent */
+ int dexp = exponent;
+
+ /* denormal support */
+ int fbh = 127 - (as_uint((float)(as_float(val_ui | 0x3f800000) - 1.0f)) >> 23);
+ int dexponent = 25 - fbh;
+ uint dval_ui = (( (val_ui << fbh) & 0x007fffff) | (dexponent << 23));
+ int ex = dexponent + n - multiplier;
+ dexponent = ex;
+ uint val = sign | (ex << 23) | (dval_ui & 0x007fffff);
+ int ex1 = dexponent + multiplier;
+ ex1 = -ex1 +25;
+ dval_ui = (((dval_ui & 0x007fffff )| 0x800000) >> ex1);
+ dval_ui = dexponent > 0 ? val :dval_ui;
+ dval_ui = dexponent > 254 ? 0x7f800000 :dval_ui; /*overflow*/
+ dval_ui = dexponent < -multiplier ? 0 : dval_ui; /*underflow*/
+ dval_ui = dval_ui | sign;
+ val_f = as_float(dval_ui);
+
+ exponent += n;
+
+ val = sign | (exponent << 23) | (val_ui & 0x007fffff);
+ ex1 = exponent + multiplier;
+ ex1 = -ex1 +25;
+ val_ui = (((val_ui & 0x007fffff )| 0x800000) >> ex1);
+ val_ui = exponent > 0 ? val :val_ui;
+ val_ui = exponent > 254 ? 0x7f800000 :val_ui; /*overflow*/
+ val_ui = exponent < -multiplier ? 0 : val_ui; /*underflow*/
+ val_ui = val_ui | sign;
+
+ val_ui = dexp == 0? dval_ui : val_ui;
+ val_f = as_float(val_ui);
+
+ val_f = isnan(x) | isinf(x) | val_x == 0 ? x : val_f;
+ return val_f;
+}
+
+// This defines all the ldexp(floatN, intN) variants.
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, ldexp, float, int)
+
+#ifdef cl_khr_fp64
+
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_DEF _CLC_OVERLOAD double ldexp(double x, int n) {
+ long l = as_ulong(x);
+ int e = (l >> 52) & 0x7ff;
+ long s = l & 0x8000000000000000;
+
+ ulong ux = as_ulong(x * 0x1.0p+53);
+ int de = ((int)(ux >> 52) & 0x7ff) - 53;
+ int c = e == 0;
+ e = c ? de: e;
+
+ ux = c ? ux : l;
+
+ int v = e + n;
+ v = clamp(v, -0x7ff, 0x7ff);
+
+ ux &= ~EXPBITS_DP64;
+
+ double mr = as_double(ux | ((ulong)(v+53) << 52));
+ mr = mr * 0x1.0p-53;
+
+ mr = v > 0 ? as_double(ux | ((ulong)v << 52)) : mr;
+
+ mr = v == 0x7ff ? as_double(s | PINFBITPATT_DP64) : mr;
+ mr = v < -53 ? as_double(s) : mr;
+
+ mr = ((n == 0) | isinf(x) | (x == 0) ) ? x : mr;
+ return mr;
+}
+
+// This defines all the ldexp(doubleN, intN) variants.
+_CLC_BINARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, double, ldexp, double, int)
+
+#endif
+
+// This defines all the ldexp(GENTYPE, int) variants
+#define __CLC_BODY <ldexp.inc>
+#include <clc/math/gentype.inc>
diff --git a/generic/lib/math/ldexp.inc b/generic/lib/math/ldexp.inc
new file mode 100644
index 0000000..6e28fbb
--- /dev/null
+++ b/generic/lib/math/ldexp.inc
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef __CLC_SCALAR
+
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE ldexp(__CLC_GENTYPE x, int n) {
+ return ldexp(x, (__CLC_INTN)n);
+}
+
+#endif
diff --git a/r600/lib/math/ldexp.cl b/r600/lib/math/ldexp.cl
index 6742cb9..0461a53 100644
--- a/r600/lib/math/ldexp.cl
+++ b/r600/lib/math/ldexp.cl
@@ -34,5 +34,5 @@ _CLC_DEFINE_BINARY_BUILTIN(float, ldexp, __builtin_amdgpu_ldexpf, float, int);
#endif

// This defines all the ldexp(GENTYPE, int);
-#define __CLC_BODY <ldexp.inc>
+#define __CLC_BODY <../../../generic/lib/math/ldexp.inc>
#include <clc/math/gentype.inc>
diff --git a/r600/lib/math/ldexp.inc b/r600/lib/math/ldexp.inc
deleted file mode 100644
index 6e28fbb..0000000
--- a/r600/lib/math/ldexp.inc
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * Copyright (c) 2014 Advanced Micro Devices, Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#ifndef __CLC_SCALAR
-
-_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE ldexp(__CLC_GENTYPE x, int n) {
- return ldexp(x, (__CLC_INTN)n);
-}
-
-#endif

passes piglit on my turks on my turks
LGTM

LGTM

I assume the original implementation fails the new piglit on SI.
The new test still passes on r600, I guess I should look for siimlar
breaking case.

LGTM

Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

LGTM.

Which targets are those? I found exp_ieee in manuals (r600-NI) and
v_exp_f32 in SI.
Also related to 5/15 (and possibly other functions), is there a way to
specify implementations per chip class, or does it need some extra
infrastructure work?

jan