[PATCH] math: Add fmod implementation

Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>

I think this can use the LLVM frem instruction instead, and would be better expanded in the backend. I have most of a patch that expands ISD::FREM for SI that I forgot about somewhere

Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>
---
  generic/include/clc/clc.h | 1 +
  generic/include/clc/math/fmod.h | 7 +++++++
  generic/lib/SOURCES | 1 +
  generic/lib/math/fmod.cl | 15 +++++++++++++++
  4 files changed, 24 insertions(+)
  create mode 100644 generic/include/clc/math/fmod.h
  create mode 100644 generic/lib/math/fmod.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index b8c1cb9..94557a1 100644
--- a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -47,6 +47,7 @@
  #include <clc/math/fma.h>
  #include <clc/math/fmax.h>
  #include <clc/math/fmin.h>
+#include <clc/math/fmod.h>
  #include <clc/math/hypot.h>
  #include <clc/math/log.h>
  #include <clc/math/log2.h>
diff --git a/generic/include/clc/math/fmod.h
b/generic/include/clc/math/fmod.h
new file mode 100644
index 0000000..737679f
--- /dev/null
+++ b/generic/include/clc/math/fmod.h
@@ -0,0 +1,7 @@
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION fmod
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index e4ba1d1..45e12aa 100644
--- a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -39,6 +39,7 @@ math/exp.cl
  math/exp10.cl
  math/fmax.cl
  math/fmin.cl
+math/fmod.cl
  math/hypot.cl
  math/mad.cl
  math/mix.cl
diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
new file mode 100644
index 0000000..091035b
--- /dev/null
+++ b/generic/lib/math/fmod.cl
@@ -0,0 +1,15 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define FUNCTION fmod
+#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
+
+#define __CLC_BODY <binary_impl.inc>
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef FUNCTION
+#undef FUNCTION_IMPL
\ No newline at end of file

I think this can use the LLVM frem instruction instead, and would be better
expanded in the backend. I have most of a patch that expands ISD::FREM for
SI that I forgot about somewhere

Hi Matt,

There's both fmod and remainder functions in the CL built-in library,
and as near as I can tell, they just differ in how to treat the result
of x/y:

From the CL 1.2 spec (6.12.12):

gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).

gentype remainder (gentype x, gentype y) => Compute the value r such
that r = x - n*y, where n
is the integer nearest the exact value of x/y. If there
are two integers closest to x/y, n shall be the even
one. If r is zero, it is given the same sign as x.

Do you happen to know which behavior the frem instruction gives us?
Truncate or Round half to nearest even? I'm guessing that one of
these will be able to use the frem instruction, and the other won't,
but I haven't checked which is which yet.

--Aaron

I’m not sure. I was operating under the assumption that frem matches food’s behavior, but I haven’t tested it particularly carefully. x86 lowers frem into calls to fmod, and I assume the OpenCL version behaves the same as libm’s

>>>
>>> Passes piglit tests on evergreen (sent to piglit list).
>>>
>>> Signed-off-by: Aaron Watry <awatry@gmail.com>
>>> ---
>>> generic/include/clc/clc.h | 1 +
>>> generic/include/clc/math/fmod.h | 7 +++++++
>>> generic/lib/SOURCES | 1 +
>>> generic/lib/math/fmod.cl | 15 +++++++++++++++
>>> 4 files changed, 24 insertions(+)
>>> create mode 100644 generic/include/clc/math/fmod.h
>>> create mode 100644 generic/lib/math/fmod.cl
>>>
>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>> index b8c1cb9..94557a1 100644
>>> --- a/generic/include/clc/clc.h
>>> +++ b/generic/include/clc/clc.h
>>> @@ -47,6 +47,7 @@
>>> #include <clc/math/fma.h>
>>> #include <clc/math/fmax.h>
>>> #include <clc/math/fmin.h>
>>> +#include <clc/math/fmod.h>
>>> #include <clc/math/hypot.h>
>>> #include <clc/math/log.h>
>>> #include <clc/math/log2.h>
>>> diff --git a/generic/include/clc/math/fmod.h
>>> b/generic/include/clc/math/fmod.h
>>> new file mode 100644
>>> index 0000000..737679f
>>> --- /dev/null
>>> +++ b/generic/include/clc/math/fmod.h
>>> @@ -0,0 +1,7 @@
>>> +#define __CLC_BODY <clc/math/binary_decl.inc>
>>> +#define __CLC_FUNCTION fmod
>>> +
>>> +#include <clc/math/gentype.inc>
>>> +
>>> +#undef __CLC_BODY
>>> +#undef __CLC_FUNCTION
>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>> index e4ba1d1..45e12aa 100644
>>> --- a/generic/lib/SOURCES
>>> +++ b/generic/lib/SOURCES
>>> @@ -39,6 +39,7 @@ math/exp.cl
>>> math/exp10.cl
>>> math/fmax.cl
>>> math/fmin.cl
>>> +math/fmod.cl
>>> math/hypot.cl
>>> math/mad.cl
>>> math/mix.cl
>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
>>> new file mode 100644
>>> index 0000000..091035b
>>> --- /dev/null
>>> +++ b/generic/lib/math/fmod.cl
>>> @@ -0,0 +1,15 @@
>>> +#include <clc/clc.h>
>>> +
>>> +#ifdef cl_khr_fp64
>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>>> +#endif
>>> +
>>> +#define FUNCTION fmod
>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
>>> +
>>> +#define __CLC_BODY <binary_impl.inc>
>>> +#include <clc/math/gentype.inc>
>>> +
>>> +#undef __CLC_BODY
>>> +#undef FUNCTION
>>> +#undef FUNCTION_IMPL
>>> \ No newline at end of file
>>
>>
>> I think this can use the LLVM frem instruction instead, and would be better
>> expanded in the backend. I have most of a patch that expands ISD::FREM for
>> SI that I forgot about somewhere
>>
>
> Hi Matt,
>
> There's both fmod and remainder functions in the CL built-in library,
> and as near as I can tell, they just differ in how to treat the result
> of x/y:
>
> From the CL 1.2 spec (6.12.12):
> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).
>
> gentype remainder (gentype x, gentype y) => Compute the value r such
> that r = x - n*y, where n
> is the integer nearest the exact value of x/y. If there
> are two integers closest to x/y, n shall be the even
> one. If r is zero, it is given the same sign as x.
>
> Do you happen to know which behavior the frem instruction gives us?
> Truncate or Round half to nearest even? I'm guessing that one of
> these will be able to use the frem instruction, and the other won't,
> but I haven't checked which is which yet.

There is both __builtin_fmod(f), and __builtin_remainder(f), but I
haven't found any documentation on them, or code outside of
Basic/Builtins.def

If they are based on math.h then both fmod and remainder seem to match
OCL definitions.

we have round to nearest even instructions, not sure if using __builtin
or adding amdgpu.rndne intrinsic is the better way to go.

jan

>>>
>>> Passes piglit tests on evergreen (sent to piglit list).
>>>
>>> Signed-off-by: Aaron Watry <awatry@gmail.com>
>>> ---
>>> generic/include/clc/clc.h | 1 +
>>> generic/include/clc/math/fmod.h | 7 +++++++
>>> generic/lib/SOURCES | 1 +
>>> generic/lib/math/fmod.cl | 15 +++++++++++++++
>>> 4 files changed, 24 insertions(+)
>>> create mode 100644 generic/include/clc/math/fmod.h
>>> create mode 100644 generic/lib/math/fmod.cl
>>>
>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
>>> index b8c1cb9..94557a1 100644
>>> --- a/generic/include/clc/clc.h
>>> +++ b/generic/include/clc/clc.h
>>> @@ -47,6 +47,7 @@
>>> #include <clc/math/fma.h>
>>> #include <clc/math/fmax.h>
>>> #include <clc/math/fmin.h>
>>> +#include <clc/math/fmod.h>
>>> #include <clc/math/hypot.h>
>>> #include <clc/math/log.h>
>>> #include <clc/math/log2.h>
>>> diff --git a/generic/include/clc/math/fmod.h
>>> b/generic/include/clc/math/fmod.h
>>> new file mode 100644
>>> index 0000000..737679f
>>> --- /dev/null
>>> +++ b/generic/include/clc/math/fmod.h
>>> @@ -0,0 +1,7 @@
>>> +#define __CLC_BODY <clc/math/binary_decl.inc>
>>> +#define __CLC_FUNCTION fmod
>>> +
>>> +#include <clc/math/gentype.inc>
>>> +
>>> +#undef __CLC_BODY
>>> +#undef __CLC_FUNCTION
>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
>>> index e4ba1d1..45e12aa 100644
>>> --- a/generic/lib/SOURCES
>>> +++ b/generic/lib/SOURCES
>>> @@ -39,6 +39,7 @@ math/exp.cl
>>> math/exp10.cl
>>> math/fmax.cl
>>> math/fmin.cl
>>> +math/fmod.cl
>>> math/hypot.cl
>>> math/mad.cl
>>> math/mix.cl
>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
>>> new file mode 100644
>>> index 0000000..091035b
>>> --- /dev/null
>>> +++ b/generic/lib/math/fmod.cl
>>> @@ -0,0 +1,15 @@
>>> +#include <clc/clc.h>
>>> +
>>> +#ifdef cl_khr_fp64
>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable
>>> +#endif
>>> +
>>> +#define FUNCTION fmod
>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
>>> +
>>> +#define __CLC_BODY <binary_impl.inc>
>>> +#include <clc/math/gentype.inc>
>>> +
>>> +#undef __CLC_BODY
>>> +#undef FUNCTION
>>> +#undef FUNCTION_IMPL
>>> \ No newline at end of file
>>
>>
>> I think this can use the LLVM frem instruction instead, and would be better
>> expanded in the backend. I have most of a patch that expands ISD::FREM for
>> SI that I forgot about somewhere
>>
>
> Hi Matt,
>
> There's both fmod and remainder functions in the CL built-in library,
> and as near as I can tell, they just differ in how to treat the result
> of x/y:
>
> From the CL 1.2 spec (6.12.12):
> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).
>
> gentype remainder (gentype x, gentype y) => Compute the value r such
> that r = x - n*y, where n
> is the integer nearest the exact value of x/y. If there
> are two integers closest to x/y, n shall be the even
> one. If r is zero, it is given the same sign as x.
>
> Do you happen to know which behavior the frem instruction gives us?
> Truncate or Round half to nearest even? I'm guessing that one of
> these will be able to use the frem instruction, and the other won't,
> but I haven't checked which is which yet.

There is both __builtin_fmod(f), and __builtin_remainder(f), but I
haven't found any documentation on them, or code outside of
Basic/Builtins.def

That's because these are libm functions(same thing with
__builtin_[sin|cos|tan|etc] and many of the other trig functions), and
there is no libm implementation that exists for R600... so calling
__builtin_modf just leads to invalid function calls and a segfault...

It's all well and good to have a built-in function in clang for most
of the math functions, but if the function isn't really built-in and
is dependent upon an external architecture-specific library, then we
either need to:

1) find a way to port the libm functions to CL C (which is potentially
difficult... most of the float-precision functions assume that the
device can at least support doubles at lower performance)

2) create an R600 implementation of libm

3) re-write the functions ourselves.

I've been trying to stick to CLC implementations of the functions
where possible to keep the implementation as architecture neutral as I
can (and to keep the libclc library as self-contained as possible).

I can refactor this to attempt to use __builtin_fmod on architectures
where this function is expected to be available with either a CLC or
bitcode override for R600, or if the frem instruction matches the
required behavior, I'll just use that for all architectures (if I
can't use it here, then maybe we can use it for remainder)... It's
just a bit of extra work.

Sorry for the rant, but this section of the built-in library has been
causing much more trouble than I really cared to take on (my trig is
pretty weak and I'm not the hugest fan of floating point
operations)... but at the same time, it's the primary roadblock
standing in the way of getting cppamp-driver-ng working on the OSS
radeon drivers (which is my current non-work focus).

If anyone cares, here's the list of functions that are still needed to
get cppamp-driver-ng working on radeonsi/r600 for which I haven't sent
patches to the list:
acosh, asinh, atanh, cbrt, cosh, cospi, erf, erfc, expm1, fdim, frexp,
ilogb, ldexp, log10, log1p (tom sent this yesterday), logb, modf,
remainder, sinh, sinpi, tanh, and tgamma

After that, we just need to enable the clang storage class specifiers
extension to get support for the static keyword in clover and there's
a change that things will work... but that's a lot of math functions
left if I can't just use trig identities to get an implementation in
place before we optimize it.

--Aaron

Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>

generic/include/clc/clc.h | 1 +
generic/include/clc/math/fmod.h | 7 +++++++
generic/lib/SOURCES | 1 +
generic/lib/math/fmod.cl | 15 +++++++++++++++
4 files changed, 24 insertions(+)
create mode 100644 generic/include/clc/math/fmod.h
create mode 100644 generic/lib/math/fmod.cl

diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h
index b8c1cb9…94557a1 100644
— a/generic/include/clc/clc.h
+++ b/generic/include/clc/clc.h
@@ -47,6 +47,7 @@
#include <clc/math/fma.h>
#include <clc/math/fmax.h>
#include <clc/math/fmin.h>
+#include <clc/math/fmod.h>
#include <clc/math/hypot.h>
#include <clc/math/log.h>
#include <clc/math/log2.h>
diff --git a/generic/include/clc/math/fmod.h
b/generic/include/clc/math/fmod.h
new file mode 100644
index 0000000…737679f
— /dev/null
+++ b/generic/include/clc/math/fmod.h
@@ -0,0 +1,7 @@
+#define __CLC_BODY <clc/math/binary_decl.inc>
+#define __CLC_FUNCTION fmod
+
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef __CLC_FUNCTION
diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES
index e4ba1d1…45e12aa 100644
— a/generic/lib/SOURCES
+++ b/generic/lib/SOURCES
@@ -39,6 +39,7 @@ math/exp.cl
math/exp10.cl
math/fmax.cl
math/fmin.cl
+math/fmod.cl
math/hypot.cl
math/mad.cl
math/mix.cl
diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl
new file mode 100644
index 0000000…091035b
— /dev/null
+++ b/generic/lib/math/fmod.cl
@@ -0,0 +1,15 @@
+#include <clc/clc.h>
+
+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+#endif
+
+#define FUNCTION fmod
+#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))
+
+#define __CLC_BODY <binary_impl.inc>
+#include <clc/math/gentype.inc>
+
+#undef __CLC_BODY
+#undef FUNCTION
+#undef FUNCTION_IMPL
\ No newline at end of file

I think this can use the LLVM frem instruction instead, and would be better
expanded in the backend. I have most of a patch that expands ISD::FREM for
SI that I forgot about somewhere

Hi Matt,

There’s both fmod and remainder functions in the CL built-in library,
and as near as I can tell, they just differ in how to treat the result
of x/y:

From the CL 1.2 spec (6.12.12):
gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).

gentype remainder (gentype x, gentype y) => Compute the value r such
that r = x - n*y, where n
is the integer nearest the exact value of x/y. If there
are two integers closest to x/y, n shall be the even
one. If r is zero, it is given the same sign as x.

Do you happen to know which behavior the frem instruction gives us?
Truncate or Round half to nearest even? I’m guessing that one of
these will be able to use the frem instruction, and the other won’t,
but I haven’t checked which is which yet.

There is both __builtin_fmod(f), and __builtin_remainder(f), but I
haven’t found any documentation on them, or code outside of
Basic/Builtins.def

That’s because these are libm functions(same thing with
_builtin[sin|cos|tan|etc] and many of the other trig functions), and
there is no libm implementation that exists for R600… so calling
__builtin_modf just leads to invalid function calls and a segfault…

It’s all well and good to have a built-in function in clang for most
of the math functions, but if the function isn’t really built-in and
is dependent upon an external architecture-specific library, then we
either need to:

  1. find a way to port the libm functions to CL C (which is potentially
    difficult… most of the float-precision functions assume that the
    device can at least support doubles at lower performance)

  2. create an R600 implementation of libm

  3. re-write the functions ourselves.

I’ve been trying to stick to CLC implementations of the functions
where possible to keep the implementation as architecture neutral as I
can (and to keep the libclc library as self-contained as possible).

I can refactor this to attempt to use __builtin_fmod on architectures
where this function is expected to be available with either a CLC or
bitcode override for R600, or if the frem instruction matches the
required behavior, I’ll just use that for all architectures (if I
can’t use it here, then maybe we can use it for remainder)… It’s
just a bit of extra work.

I’ll try to post my patch implementing frem for R600 later today, although I don’t have access to hardware right now to test it on

I’ve pushed the R600 implementation of frem, so you should be able to see if it works now. The double version should fail any tests since double fdiv is implemented incorrectly until some operand legalization bugs are fixed