[PATCH 1/8] Fix implementation of length builtin

The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This is a generic implementation which just calls sqrt. Targets should
override this if they want a faster implementation.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

Thanks for these Tom.

I plan on reviewing them asap, and feel silly that I had written distance, fast distance and fast length, but hadn’t managed to mail them in yet.

For reference, I’m working on erfc, erf, and ldexp… the only one I haven’t finished up yet is erf (trying to figure out how to deal with lack of subnormal support on devices that don’t support doubles).

–Aaron

Thanks for these Tom.

I plan on reviewing them asap, and feel silly that I had written distance,
fast distance and fast length, but hadn't managed to mail them in yet.

I was a little worried about something like this. I started working and got on
a roll so I figured I would just finish out the geom functions.

For reference, I'm working on erfc, erf, and ldexp... the only one I
haven't finished up yet is erf (trying to figure out how to deal with lack
of subnormal support on devices that don't support doubles).

Supporting subnormals for float types is not required by OpenCL 1.1/1.2.
Does this means we can ignore them in the library?

-Tom

I think it would be a good idea to deal with it while you’re already working on the function for later

-Matt

I can attempt to handle that now… It’ll just take a bit of research on my part. The AMD implementation of float erf assumes that you have support for doubles when the device doesn’t support subnormals. I’m not sure that this is a safe assumption to make, so we’ll probably want/need to find some other way to deal with that situation.

I can at least surround the existing implementation with an #if defined(cl_khr_fp64) && !defined(SUBNORMALS_SUPPORTED)), and we’ll just have to have an else if for !SUBNORMALS && !DOUBLES.

–Aaron

The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
---
generic/lib/geometric/length.cl | 120
++++++++++++++++++++++++++++++++++++++-
generic/lib/geometric/length.inc | 3 -
2 files changed, 117 insertions(+), 6 deletions(-)
delete mode 100644 generic/lib/geometric/length.inc

diff --git a/generic/lib/geometric/length.cl b/generic/lib/geometric/
length.cl
index ef087c7..3037372 100644
--- a/generic/lib/geometric/length.cl
+++ b/generic/lib/geometric/length.cl
@@ -1,8 +1,122 @@
+/*
+ * Copyright (c) 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the "Software"),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN
+ * THE SOFTWARE.
+ */
+
#include <clc/clc.h>

+_CLC_OVERLOAD _CLC_DEF float length(float p) {
+ return fabs(p);
+}
+
+_CLC_OVERLOAD _CLC_DEF float length(float2 p) {
+ float l2 = dot(p, p);
+
+ if (l2 < FLT_MIN) {
+ p *= 0x1.0p+86F;
+ return sqrt(dot(p, p)) * 0x1.0p-86F;
+ } else if (l2 == INFINITY) {
+ p *= 0x1.0p-65F;
+ return sqrt(dot(p, p)) * 0x1.0p+65F;
+ }
+
+ return sqrt(l2);
+}

I'm assuming that the FLT_MIN/INFINITY cases are correct here. It's
definitely a good thing to correct the scalar float implementation.

The only suggestion that I have here is to consider combining the
float2/float3/float4 into a single macro that is invoked 3 times (and then
the same for the double version).

It's not necessary, as that can be cleaned up later if needed.

More comments, later...

+
+_CLC_OVERLOAD _CLC_DEF float length(float3 p) {
+ float l2 = dot(p, p);
+
+ if (l2 < FLT_MIN) {
+ p *= 0x1.0p+86F;
+ return sqrt(dot(p, p)) * 0x1.0p-86F;
+ } else if (l2 == INFINITY) {
+ p *= 0x1.0p-66F;
+ return sqrt(dot(p, p)) * 0x1.0p+66F;

float2 uses p[+-]65F, but float3/float4 use p[+-]66F. Is this correct?

+ }
+
+ return sqrt(l2);
+}
+
+_CLC_OVERLOAD _CLC_DEF float length(float4 p) {
+ float l2 = dot(p, p);
+
+ if (l2 < FLT_MIN) {
+ p *= 0x1.0p+86F;
+ return sqrt(dot(p, p)) * 0x1.0p-86F;
+ } else if (l2 == INFINITY) {
+ p *= 0x1.0p-66f;
+ return sqrt(dot(p, p)) * 0x1.0p+66F;
+ }
+ return sqrt(l2);
+}
+
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
-#endif

-#define __CLC_BODY <length.inc>
-#include <clc/geometric/floatn.inc>
+_CLC_OVERLOAD _CLC_DEF double length(double p){
+ return fabs(p);
+}
+
+_CLC_OVERLOAD _CLC_DEF double length(double2 p) {
+ double l2 = dot(p, p);
+
+ if (l2 < DBL_MIN) {
+ p *= 0x1.0p+563;
+ return sqrt(dot(p, p)) * 0x1.0p-563;
+ } else if (l2 == INFINITY) {
+ p *= 0x1.0p-513;
+ return sqrt(dot(p, p)) * 0x1.0p+513;
+ }
+
+ return sqrt(l2);
+}
+
+_CLC_OVERLOAD _CLC_DEF double length(double3 p) {
+ double l2 = dot(p, p);
+
+ if (l2 < DBL_MIN) {
+ p *= 0x1.0p+563;
+ return sqrt(dot(p, p)) * 0x1.0p-563;
+ } else if (l2 == INFINITY) {
+ p *= 0x1.0p-514;
+ return sqrt(dot(p, p)) * 0x1.0p+514;

The double2 version used -513/+513, but double3/double4 use +514/-514. Is
one of these wrong?

The style question at the top, I don't care too much about (combining the
vector versions into a macro), but I would like to see if we can get an
answer about the correctness of the various exponents being used for the
float/double vector versions. It just doesn't seem like they can all be
right from a first glance.

--Aaron

This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
---
generic/include/clc/clc.h | 1 +
generic/include/clc/geometric/distance.h | 22 ++++++++++++++++++++++
generic/include/clc/geometric/distance.inc | 23 +++++++++++++++++++++++
generic/lib/SOURCES | 1 +
generic/lib/geometric/distance.cl | 30
++++++++++++++++++++++++++++++
generic/lib/geometric/distance.inc | 25 +++++++++++++++++++++++++
6 files changed, 102 insertions(+)
create mode 100644 generic/include/clc/geometric/distance.inc
create mode 100644 generic/lib/geometric/distance.cl
create mode 100644 generic/lib/geometric/distance.inc

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index cee1614..2fb818a 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -114,6 +114,7 @@

/* 6.11.5 Geometric Functions */
#include <clc/geometric/cross.h>
+#include <clc/geometric/distance.h>
#include <clc/geometric/dot.h>
#include <clc/geometric/length.h>
#include <clc/geometric/normalize.h>
diff --git a/generic/include/clc/geometric/distance.h
b/generic/include/clc/geometric/distance.h
index 3e91332..c7aa6a1 100644
--- a/generic/include/clc/geometric/distance.h
+++ b/generic/include/clc/geometric/distance.h
@@ -1,2 +1,24 @@
+/*
+ * Copyright (c) 2014,2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a copy
+ * of this software and associated documentation files (the "Software"),
to deal
+ * in the Software without restriction, including without limitation the
rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN
+ * THE SOFTWARE.
+ */
+

This patch looks good to me.

I don't see much point in retroactively adding the AMD copyright statement
to distance.h since you're not modifying the file itself at this time (and
Peter was the initial author), but while it seems weird, you are also
included in the commit history for the file (adding __CLC_ prefix to the
macro names).... so I guess it's not completely out of line.

But that being said, the functionality and code looks exactly like what I
had written myself independently (other than chosen variable names).

--Aaron

Thanks for these Tom.

I plan on reviewing them asap, and feel silly that I had written distance,
fast distance and fast length, but hadn't managed to mail them in yet.

For reference, I'm working on erfc, erf, and ldexp... the only one I
haven't finished up yet is erf (trying to figure out how to deal with lack
of subnormal support on devices that don't support doubles).

Hi Aaron,

I just sent a patch to the mailing lists that adds functions for
querying whether or not subnormals are supported. Do you think you
you could send out your ldexp implementation? I think it would be a good test
case for these new functions, and it's one I was planning to implement
next.

-Tom

Isn't the ldexp implementation for SI just a call to the intrinsic?

> I think it would be a good test
> case for these new functions, and it's one I was planning to implement
> next.
Isn't the ldexp implementation for SI just a call to the intrinsic?

Possibly, but I was looking at the generic implementation.

-Tom

> This implementation was ported from the AMD builtin library
> and has been tested with piglit, OpenCV, and the ocl conformance tests.
> ---
> generic/include/clc/clc.h | 1 +
> generic/include/clc/geometric/distance.h | 22 ++++++++++++++++++++++
> generic/include/clc/geometric/distance.inc | 23 +++++++++++++++++++++++
> generic/lib/SOURCES | 1 +
> generic/lib/geometric/distance.cl | 30
> ++++++++++++++++++++++++++++++
> generic/lib/geometric/distance.inc | 25 +++++++++++++++++++++++++
> 6 files changed, 102 insertions(+)
> create mode 100644 generic/include/clc/geometric/distance.inc
> create mode 100644 generic/lib/geometric/distance.cl
> create mode 100644 generic/lib/geometric/distance.inc
>
> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
> index cee1614..2fb818a 100644
> --- a/generic/include/clc/clc.h
> +++ b/generic/include/clc/clc.h
> @@ -114,6 +114,7 @@
>
> /* 6.11.5 Geometric Functions */
> #include <clc/geometric/cross.h>
> +#include <clc/geometric/distance.h>
> #include <clc/geometric/dot.h>
> #include <clc/geometric/length.h>
> #include <clc/geometric/normalize.h>
> diff --git a/generic/include/clc/geometric/distance.h
> b/generic/include/clc/geometric/distance.h
> index 3e91332..c7aa6a1 100644
> --- a/generic/include/clc/geometric/distance.h
> +++ b/generic/include/clc/geometric/distance.h
> @@ -1,2 +1,24 @@
> +/*
> + * Copyright (c) 2014,2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining
> a copy
> + * of this software and associated documentation files (the "Software"),
> to deal
> + * in the Software without restriction, including without limitation the
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or
> sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> IN
> + * THE SOFTWARE.
> + */
> +
>

This patch looks good to me.

I don't see much point in retroactively adding the AMD copyright statement
to distance.h since you're not modifying the file itself at this time (and
Peter was the initial author), but while it seems weird, you are also
included in the commit history for the file (adding __CLC_ prefix to the
macro names).... so I guess it's not completely out of line.

Thanks for catching this. It was an oversight on my part, I will drop
this part of the commit.

-Tom

LGTM

LGTM