Implement more math builtins

Hi,

This series implements several more math builtins. cos/sin
patches require an implementation of fract to work correctly
for doubles.

I have patches to both implement fract and fix fmin, which fract
depends on, but I still get failures in the fract and fmin tests
due to what I think are bugs in the R600/SI compiler.

Even with those bugs, the sin/cos tests still pass with these
broken implementations.

atan2 and atanpi also depend on an ldexp implementation. For testing,
I just implemented ldexp using the __builtin_amdgpu_ldexp builtin
and these both passed the tests I have.

-Tom

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This will help avoid naming conflicts with functions defined in
kernels linking with libclc.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

LGTM.

Jeroen

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
---
generic/include/clc/clc.h | 1 +
generic/include/clc/math/acospi.h | 24 +++++
generic/include/clc/math/acospi.inc | 23 +++++
generic/lib/SOURCES | 1 +
generic/lib/math/acospi.cl | 172 ++++++++++++++++++++++++++++++++++++
5 files changed, 221 insertions(+)
create mode 100644 generic/include/clc/math/acospi.h
create mode 100644 generic/include/clc/math/acospi.inc
create mode 100644 generic/lib/math/acospi.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index 269d30d..8ac7fbc 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -34,6 +34,7 @@
/* 6.11.2 Math Functions */
#include <clc/math/acos.h>
#include <clc/math/acosh.h>
+#include <clc/math/acospi.h>
#include <clc/math/asin.h>
#include <clc/math/atan.h>
#include <clc/math/atan2.h>
diff --git a/generic/include/clc/math/acospi.h b/generic/include/clc/math/acospi.h
new file mode 100644
index 0000000..987657c
--- /dev/null
+++ b/generic/include/clc/math/acospi.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (c) 2014, 2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#define __CLC_BODY <clc/math/acospi.inc>
+#include <clc/math/gentype.inc>
diff --git a/generic/include/clc/math/acospi.inc b/generic/include/clc/math/acospi.inc
new file mode 100644
index 0000000..265e6a8
--- /dev/null
+++ b/generic/include/clc/math/acospi.inc
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2014, 2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+_CLC_OVERLOAD _CLC_DECL __CLC_GENTYPE acospi(__CLC_GENTYPE x);
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index 29c40d8..c6d11e4 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -63,6 +63,7 @@ integer/sub_sat_impl.ll
integer/upsample.cl
math/acos.cl
math/acosh.cl
+math/acospi.cl
math/asin.cl
math/atan.cl
math/atan2.cl
diff --git a/generic/lib/math/acospi.cl b/generic/lib/math/acospi.cl
new file mode 100644
index 0000000..c91fc41
--- /dev/null
+++ b/generic/lib/math/acospi.cl
@@ -0,0 +1,172 @@
+/*
+ * Copyright (c) 2014,2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <clc/clc.h>
+
+#include "math.h"
+#include "../clcmacro.h"
+
+_CLC_OVERLOAD _CLC_DEF float acospi(float x) {
+ // Computes arccos(x).
+ // The argument is first reduced by noting that arccos(x)
+ // is invalid for abs(x) > 1. For denormal and small
+ // arguments arccos(x) = pi/2 to machine accuracy.
+ // Remaining argument ranges are handled as follows.
+ // For abs(x) <= 0.5 use
+ // arccos(x) = pi/2 - arcsin(x)
+ // = pi/2 - (x + x^3*R(x^2))
+ // where R(x^2) is a rational minimax approximation to
+ // (arcsin(x) - x)/x^3.
+ // For abs(x) > 0.5 exploit the identity:
+ // arccos(x) = pi - 2*arcsin(sqrt(1-x)/2)
+ // together with the above rational approximation, and
+ // reconstruct the terms carefully.
+
+
+ // Some constants and split constants.
+ const float pi = 3.1415926535897933e+00f;
+ const float piby2_head = 1.5707963267948965580e+00f; /* 0x3ff921fb54442d18 */
+ const float piby2_tail = 6.12323399573676603587e-17f; /* 0x3c91a62633145c07 */
+
+ uint ux = as_uint(x);
+ uint aux = ux & ~SIGNBIT_SP32;
+ int xneg = ux != aux;
+ int xexp = (int)(aux >> EXPSHIFTBITS_SP32) - EXPBIAS_SP32;
+
+ float y = as_float(aux);
+
+ // transform if |x| >= 0.5
+ int transform = xexp >= -1;
+
+ float y2 = y * y;
+ float yt = 0.5f * (1.0f - y);
+ float r = transform ? yt : y2;
+
+ // Use a rational approximation for [0.0, 0.5]
+ float a = mad(r, mad(r, mad(r, -0.00396137437848476485201154797087F, -0.0133819288943925804214011424456F),
+ -0.0565298683201845211985026327361F),
+ 0.184161606965100694821398249421F);

git complains about space before tab in indent on the above 2 lines ^^,
and the indentation does look funny

+ float b = mad(r, -0.836411276854206731913362287293F, 1.10496961524520294485512696706F);
+ float u = r * MATH_DIVIDE(a, b);
+
+ float s = MATH_SQRT(r);
+ y = s;
+ float s1 = as_float(as_uint(s) & 0xffff0000);
+ float c = MATH_DIVIDE(r - s1 * s1, s + s1);
+ // float rettn = 1.0f - MATH_DIVIDE(2.0f * (s + (y * u - piby2_tail)), pi);
+ float rettn = 1.0f - MATH_DIVIDE(2.0f * (s + mad(y, u, -piby2_tail)), pi);
+ // float rettp = MATH_DIVIDE(2.0F * s1 + (2.0F * c + 2.0F * y * u), pi);
+ float rettp = MATH_DIVIDE(2.0f*(s1 + mad(y, u, c)), pi);
+ float rett = xneg ? rettn : rettp;
+ // float ret = MATH_DIVIDE(piby2_head - (x - (piby2_tail - x * u)), pi);
+ float ret = MATH_DIVIDE(piby2_head - (x - mad(x, -u, piby2_tail)), pi);
+
+ ret = transform ? rett : ret;
+ ret = aux > 0x3f800000U ? as_float(QNANBITPATT_SP32) : ret;
+ ret = ux == 0x3f800000U ? 0.0f : ret;
+ ret = ux == 0xbf800000U ? 1.0f : ret;
+ ret = xexp < -26 ? 0.5f : ret;
+ return ret;
+}
+
+_CLC_UNARY_VECTORIZE(_CLC_OVERLOAD _CLC_DEF, float, acospi, float)
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+
+_CLC_OVERLOAD _CLC_DEF double acospi(double x) {
+ // Computes arccos(x).
+ // The argument is first reduced by noting that arccos(x)
+ // is invalid for abs(x) > 1. For denormal and small
+ // arguments arccos(x) = pi/2 to machine accuracy.
+ // Remaining argument ranges are handled as follows.
+ // For abs(x) <= 0.5 use
+ // arccos(x) = pi/2 - arcsin(x)
+ // = pi/2 - (x + x^3*R(x^2))
+ // where R(x^2) is a rational minimax approximation to
+ // (arcsin(x) - x)/x^3.
+ // For abs(x) > 0.5 exploit the identity:
+ // arccos(x) = pi - 2*arcsin(sqrt(1-x)/2)
+ // together with the above rational approximation, and
+ // reconstruct the terms carefully.
+
+ const double pi = 0x1.921fb54442d18p+1;
+ const double piby2_tail = 6.12323399573676603587e-17; /* 0x3c91a62633145c07 */
+
+ double y = fabs(x);
+ int xneg = as_int2(x).hi < 0;
+ int xexp = (as_int2(y).hi >> 20) - EXPBIAS_DP64;
+
+ // abs(x) >= 0.5
+ int transform = xexp >= -1;
+
+ // Transform y into the range [0,0.5)
+ double r1 = 0.5 * (1.0 - y);
+ double s = sqrt(r1);
+ double r = y * y;
+ r = transform ? r1 : r;
+ y = transform ? s : y;
+
+ // Use a rational approximation for [0.0, 0.5]
+ double un = fma(r,
+ fma(r,
+ fma(r,
+ fma(r,
+ fma(r, 0.0000482901920344786991880522822991,
+ 0.00109242697235074662306043804220),
+ -0.0549989809235685841612020091328),
+ 0.275558175256937652532686256258),
+ -0.445017216867635649900123110649),
+ 0.227485835556935010735943483075);
+
+ double ud = fma(r,
+ fma(r,
+ fma(r,
+ fma(r, 0.105869422087204370341222318533,

git complains about trailing whitespace on this line ^^

Hi,

I applied patches 1-8 (with 9 already in the tree, and also the ldexp
patch to avoid warnings), and tested (piglit) on my Turks:
acosh LLVM ERROR: Cannot select: REGISTER_LOAD
acospi PASS
asinh PASS
asinpi PASS
atanh same failure as acosh
atanpi PASS
atan2pi PASS

I have posted piglits for the 3 functions that were missing the passing
patches LGTM. The failures are caused by log1p, so I guess the rest is
OK as well, but couldn't really test them.
I cannot test doubles.

jan

PS: ldexp fails too, but i guess that is expected without fp64

Hi,

I applied patches 1-8 (with 9 already in the tree, and also the ldexp
patch to avoid warnings), and tested (piglit) on my Turks:
acosh LLVM ERROR: Cannot select: REGISTER_LOAD
acospi PASS
asinh PASS
asinpi PASS
atanh same failure as acosh
atanpi PASS
atan2pi PASS

Hi,

I've committed acospi, asinh, asinpi, and atanpi. The failures for
acosh and atanh look like r600 specific bugs. These both pass on SI+,
so if no one objects I will commit these as well.

-Tom

> Hi,
>
> I applied patches 1-8 (with 9 already in the tree, and also the ldexp
> patch to avoid warnings), and tested (piglit) on my Turks:
> acosh LLVM ERROR: Cannot select: REGISTER_LOAD
> acospi PASS
> asinh PASS
> asinpi PASS
> atanh same failure as acosh
> atanpi PASS
> atan2pi PASS
>

Hi,

I've committed acospi, asinh, asinpi, and atanpi.

atan2pi is also OK on r600

The failures for
acosh and atanh look like r600 specific bugs. These both pass on SI+,
so if no one objects I will commit these as well.

I have verified that the failures are caused by USE_TABLE in log1p
function, which is a known limitation for r600.
Strangely enough, using 1.0f instead of table values made piglit pass
(acosh, atanh, log1p) on r600, I guess the tests should be improved.

jan