Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>

Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>

I think this can use the LLVM frem instruction instead, and would be better expanded in the backend. I have most of a patch that expands ISD::FREM for SI that I forgot about somewhere

Passes piglit tests on evergreen (sent to piglit list).

Signed-off-by: Aaron Watry <awatry@gmail.com>

---

generic/include/clc/clc.h | 1 +

generic/include/clc/math/fmod.h | 7 +++++++

generic/lib/SOURCES | 1 +

generic/lib/math/fmod.cl | 15 +++++++++++++++

4 files changed, 24 insertions(+)

create mode 100644 generic/include/clc/math/fmod.h

create mode 100644 generic/lib/math/fmod.cldiff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h

index b8c1cb9..94557a1 100644

--- a/generic/include/clc/clc.h

+++ b/generic/include/clc/clc.h

@@ -47,6 +47,7 @@

#include <clc/math/fma.h>

#include <clc/math/fmax.h>

#include <clc/math/fmin.h>

+#include <clc/math/fmod.h>

#include <clc/math/hypot.h>

#include <clc/math/log.h>

#include <clc/math/log2.h>

diff --git a/generic/include/clc/math/fmod.h

b/generic/include/clc/math/fmod.h

new file mode 100644

index 0000000..737679f

--- /dev/null

+++ b/generic/include/clc/math/fmod.h

@@ -0,0 +1,7 @@

+#define __CLC_BODY <clc/math/binary_decl.inc>

+#define __CLC_FUNCTION fmod

+

+#include <clc/math/gentype.inc>

+

+#undef __CLC_BODY

+#undef __CLC_FUNCTION

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES

index e4ba1d1..45e12aa 100644

--- a/generic/lib/SOURCES

+++ b/generic/lib/SOURCES

@@ -39,6 +39,7 @@ math/exp.cl

math/exp10.cl

math/fmax.cl

math/fmin.cl

+math/fmod.cl

math/hypot.cl

math/mad.cl

math/mix.cl

diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl

new file mode 100644

index 0000000..091035b

--- /dev/null

+++ b/generic/lib/math/fmod.cl

@@ -0,0 +1,15 @@

+#include <clc/clc.h>

+

+#ifdef cl_khr_fp64

+#pragma OPENCL EXTENSION cl_khr_fp64 : enable

+#endif

+

+#define FUNCTION fmod

+#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))

+

+#define __CLC_BODY <binary_impl.inc>

+#include <clc/math/gentype.inc>

+

+#undef __CLC_BODY

+#undef FUNCTION

+#undef FUNCTION_IMPL

\ No newline at end of file

expanded in the backend. I have most of a patch that expands ISD::FREM for

SI that I forgot about somewhere

Hi Matt,

There's both fmod and remainder functions in the CL built-in library,

and as near as I can tell, they just differ in how to treat the result

of x/y:

From the CL 1.2 spec (6.12.12):

gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).

gentype remainder (gentype x, gentype y) => Compute the value r such

that r = x - n*y, where n

is the integer nearest the exact value of x/y. If there

are two integers closest to x/y, n shall be the even

one. If r is zero, it is given the same sign as x.

Do you happen to know which behavior the frem instruction gives us?

Truncate or Round half to nearest even? I'm guessing that one of

these will be able to use the frem instruction, and the other won't,

but I haven't checked which is which yet.

--Aaron

I’m not sure. I was operating under the assumption that frem matches food’s behavior, but I haven’t tested it particularly carefully. x86 lowers frem into calls to fmod, and I assume the OpenCL version behaves the same as libm’s

>>>

>>> Passes piglit tests on evergreen (sent to piglit list).

>>>

>>> Signed-off-by: Aaron Watry <awatry@gmail.com>

>>> ---

>>> generic/include/clc/clc.h | 1 +

>>> generic/include/clc/math/fmod.h | 7 +++++++

>>> generic/lib/SOURCES | 1 +

>>> generic/lib/math/fmod.cl | 15 +++++++++++++++

>>> 4 files changed, 24 insertions(+)

>>> create mode 100644 generic/include/clc/math/fmod.h

>>> create mode 100644 generic/lib/math/fmod.cl

>>>

>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h

>>> index b8c1cb9..94557a1 100644

>>> --- a/generic/include/clc/clc.h

>>> +++ b/generic/include/clc/clc.h

>>> @@ -47,6 +47,7 @@

>>> #include <clc/math/fma.h>

>>> #include <clc/math/fmax.h>

>>> #include <clc/math/fmin.h>

>>> +#include <clc/math/fmod.h>

>>> #include <clc/math/hypot.h>

>>> #include <clc/math/log.h>

>>> #include <clc/math/log2.h>

>>> diff --git a/generic/include/clc/math/fmod.h

>>> b/generic/include/clc/math/fmod.h

>>> new file mode 100644

>>> index 0000000..737679f

>>> --- /dev/null

>>> +++ b/generic/include/clc/math/fmod.h

>>> @@ -0,0 +1,7 @@

>>> +#define __CLC_BODY <clc/math/binary_decl.inc>

>>> +#define __CLC_FUNCTION fmod

>>> +

>>> +#include <clc/math/gentype.inc>

>>> +

>>> +#undef __CLC_BODY

>>> +#undef __CLC_FUNCTION

>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES

>>> index e4ba1d1..45e12aa 100644

>>> --- a/generic/lib/SOURCES

>>> +++ b/generic/lib/SOURCES

>>> @@ -39,6 +39,7 @@ math/exp.cl

>>> math/exp10.cl

>>> math/fmax.cl

>>> math/fmin.cl

>>> +math/fmod.cl

>>> math/hypot.cl

>>> math/mad.cl

>>> math/mix.cl

>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl

>>> new file mode 100644

>>> index 0000000..091035b

>>> --- /dev/null

>>> +++ b/generic/lib/math/fmod.cl

>>> @@ -0,0 +1,15 @@

>>> +#include <clc/clc.h>

>>> +

>>> +#ifdef cl_khr_fp64

>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable

>>> +#endif

>>> +

>>> +#define FUNCTION fmod

>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))

>>> +

>>> +#define __CLC_BODY <binary_impl.inc>

>>> +#include <clc/math/gentype.inc>

>>> +

>>> +#undef __CLC_BODY

>>> +#undef FUNCTION

>>> +#undef FUNCTION_IMPL

>>> \ No newline at end of file

>>

>>

>> I think this can use the LLVM frem instruction instead, and would be better

>> expanded in the backend. I have most of a patch that expands ISD::FREM for

>> SI that I forgot about somewhere

>>

>

> Hi Matt,

>

> There's both fmod and remainder functions in the CL built-in library,

> and as near as I can tell, they just differ in how to treat the result

> of x/y:

>

> From the CL 1.2 spec (6.12.12):

> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).

>

> gentype remainder (gentype x, gentype y) => Compute the value r such

> that r = x - n*y, where n

> is the integer nearest the exact value of x/y. If there

> are two integers closest to x/y, n shall be the even

> one. If r is zero, it is given the same sign as x.

>

> Do you happen to know which behavior the frem instruction gives us?

> Truncate or Round half to nearest even? I'm guessing that one of

> these will be able to use the frem instruction, and the other won't,

> but I haven't checked which is which yet.

There is both __builtin_fmod(f), and __builtin_remainder(f), but I

haven't found any documentation on them, or code outside of

Basic/Builtins.def

If they are based on math.h then both fmod and remainder seem to match

OCL definitions.

we have round to nearest even instructions, not sure if using __builtin

or adding amdgpu.rndne intrinsic is the better way to go.

jan

>>>

>>> Passes piglit tests on evergreen (sent to piglit list).

>>>

>>> Signed-off-by: Aaron Watry <awatry@gmail.com>

>>> ---

>>> generic/include/clc/clc.h | 1 +

>>> generic/include/clc/math/fmod.h | 7 +++++++

>>> generic/lib/SOURCES | 1 +

>>> generic/lib/math/fmod.cl | 15 +++++++++++++++

>>> 4 files changed, 24 insertions(+)

>>> create mode 100644 generic/include/clc/math/fmod.h

>>> create mode 100644 generic/lib/math/fmod.cl

>>>

>>> diff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h

>>> index b8c1cb9..94557a1 100644

>>> --- a/generic/include/clc/clc.h

>>> +++ b/generic/include/clc/clc.h

>>> @@ -47,6 +47,7 @@

>>> #include <clc/math/fma.h>

>>> #include <clc/math/fmax.h>

>>> #include <clc/math/fmin.h>

>>> +#include <clc/math/fmod.h>

>>> #include <clc/math/hypot.h>

>>> #include <clc/math/log.h>

>>> #include <clc/math/log2.h>

>>> diff --git a/generic/include/clc/math/fmod.h

>>> b/generic/include/clc/math/fmod.h

>>> new file mode 100644

>>> index 0000000..737679f

>>> --- /dev/null

>>> +++ b/generic/include/clc/math/fmod.h

>>> @@ -0,0 +1,7 @@

>>> +#define __CLC_BODY <clc/math/binary_decl.inc>

>>> +#define __CLC_FUNCTION fmod

>>> +

>>> +#include <clc/math/gentype.inc>

>>> +

>>> +#undef __CLC_BODY

>>> +#undef __CLC_FUNCTION

>>> diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES

>>> index e4ba1d1..45e12aa 100644

>>> --- a/generic/lib/SOURCES

>>> +++ b/generic/lib/SOURCES

>>> @@ -39,6 +39,7 @@ math/exp.cl

>>> math/exp10.cl

>>> math/fmax.cl

>>> math/fmin.cl

>>> +math/fmod.cl

>>> math/hypot.cl

>>> math/mad.cl

>>> math/mix.cl

>>> diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl

>>> new file mode 100644

>>> index 0000000..091035b

>>> --- /dev/null

>>> +++ b/generic/lib/math/fmod.cl

>>> @@ -0,0 +1,15 @@

>>> +#include <clc/clc.h>

>>> +

>>> +#ifdef cl_khr_fp64

>>> +#pragma OPENCL EXTENSION cl_khr_fp64 : enable

>>> +#endif

>>> +

>>> +#define FUNCTION fmod

>>> +#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))

>>> +

>>> +#define __CLC_BODY <binary_impl.inc>

>>> +#include <clc/math/gentype.inc>

>>> +

>>> +#undef __CLC_BODY

>>> +#undef FUNCTION

>>> +#undef FUNCTION_IMPL

>>> \ No newline at end of file

>>

>>

>> I think this can use the LLVM frem instruction instead, and would be better

>> expanded in the backend. I have most of a patch that expands ISD::FREM for

>> SI that I forgot about somewhere

>>

>

> Hi Matt,

>

> There's both fmod and remainder functions in the CL built-in library,

> and as near as I can tell, they just differ in how to treat the result

> of x/y:

>

> From the CL 1.2 spec (6.12.12):

> gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).

>

> gentype remainder (gentype x, gentype y) => Compute the value r such

> that r = x - n*y, where n

> is the integer nearest the exact value of x/y. If there

> are two integers closest to x/y, n shall be the even

> one. If r is zero, it is given the same sign as x.

>

> Do you happen to know which behavior the frem instruction gives us?

> Truncate or Round half to nearest even? I'm guessing that one of

> these will be able to use the frem instruction, and the other won't,

> but I haven't checked which is which yet.There is both __builtin_fmod(f), and __builtin_remainder(f), but I

haven't found any documentation on them, or code outside of

Basic/Builtins.def

That's because these are libm functions(same thing with

__builtin_[sin|cos|tan|etc] and many of the other trig functions), and

there is no libm implementation that exists for R600... so calling

__builtin_modf just leads to invalid function calls and a segfault...

It's all well and good to have a built-in function in clang for most

of the math functions, but if the function isn't really built-in and

is dependent upon an external architecture-specific library, then we

either need to:

1) find a way to port the libm functions to CL C (which is potentially

difficult... most of the float-precision functions assume that the

device can at least support doubles at lower performance)

2) create an R600 implementation of libm

3) re-write the functions ourselves.

I've been trying to stick to CLC implementations of the functions

where possible to keep the implementation as architecture neutral as I

can (and to keep the libclc library as self-contained as possible).

I can refactor this to attempt to use __builtin_fmod on architectures

where this function is expected to be available with either a CLC or

bitcode override for R600, or if the frem instruction matches the

required behavior, I'll just use that for all architectures (if I

can't use it here, then maybe we can use it for remainder)... It's

just a bit of extra work.

Sorry for the rant, but this section of the built-in library has been

causing much more trouble than I really cared to take on (my trig is

pretty weak and I'm not the hugest fan of floating point

operations)... but at the same time, it's the primary roadblock

standing in the way of getting cppamp-driver-ng working on the OSS

radeon drivers (which is my current non-work focus).

If anyone cares, here's the list of functions that are still needed to

get cppamp-driver-ng working on radeonsi/r600 for which I haven't sent

patches to the list:

acosh, asinh, atanh, cbrt, cosh, cospi, erf, erfc, expm1, fdim, frexp,

ilogb, ldexp, log10, log1p (tom sent this yesterday), logb, modf,

remainder, sinh, sinpi, tanh, and tgamma

After that, we just need to enable the clang storage class specifiers

extension to get support for the static keyword in clover and there's

a change that things will work... but that's a lot of math functions

left if I can't just use trig identities to get an implementation in

place before we optimize it.

--Aaron

Passes piglit tests on evergreen (sent to piglit list).

## Signed-off-by: Aaron Watry <awatry@gmail.com>

generic/include/clc/clc.h | 1 +

generic/include/clc/math/fmod.h | 7 +++++++

generic/lib/SOURCES | 1 +

generic/lib/math/fmod.cl | 15 +++++++++++++++

4 files changed, 24 insertions(+)

create mode 100644 generic/include/clc/math/fmod.h

create mode 100644 generic/lib/math/fmod.cldiff --git a/generic/include/clc/clc.h b/generic/include/clc/clc.h

index b8c1cb9…94557a1 100644

— a/generic/include/clc/clc.h

+++ b/generic/include/clc/clc.h

@@ -47,6 +47,7 @@

#include <clc/math/fma.h>

#include <clc/math/fmax.h>

#include <clc/math/fmin.h>

+#include <clc/math/fmod.h>

#include <clc/math/hypot.h>

#include <clc/math/log.h>

#include <clc/math/log2.h>

diff --git a/generic/include/clc/math/fmod.h

b/generic/include/clc/math/fmod.h

new file mode 100644

index 0000000…737679f

— /dev/null

+++ b/generic/include/clc/math/fmod.h

@@ -0,0 +1,7 @@

+#define __CLC_BODY <clc/math/binary_decl.inc>

+#define __CLC_FUNCTION fmod

+

+#include <clc/math/gentype.inc>

+

+#undef __CLC_BODY

+#undef __CLC_FUNCTION

diff --git a/generic/lib/SOURCES b/generic/lib/SOURCES

index e4ba1d1…45e12aa 100644

— a/generic/lib/SOURCES

+++ b/generic/lib/SOURCES

@@ -39,6 +39,7 @@ math/exp.cl

math/exp10.cl

math/fmax.cl

math/fmin.cl

+math/fmod.cl

math/hypot.cl

math/mad.cl

math/mix.cl

diff --git a/generic/lib/math/fmod.cl b/generic/lib/math/fmod.cl

new file mode 100644

index 0000000…091035b

— /dev/null

+++ b/generic/lib/math/fmod.cl

@@ -0,0 +1,15 @@

+#include <clc/clc.h>

+

+#ifdef cl_khr_fp64

+#pragma OPENCL EXTENSION cl_khr_fp64 : enable

+#endif

+

+#define FUNCTION fmod

+#define FUNCTION_IMPL(x, y) ( (x) - (y) * trunc((x) / (y)))

+

+#define __CLC_BODY <binary_impl.inc>

+#include <clc/math/gentype.inc>

+

+#undef __CLC_BODY

+#undef FUNCTION

+#undef FUNCTION_IMPL

\ No newline at end of file

expanded in the backend. I have most of a patch that expands ISD::FREM for

SI that I forgot about somewhereHi Matt,

There’s both fmod and remainder functions in the CL built-in library,

and as near as I can tell, they just differ in how to treat the result

of x/y:From the CL 1.2 spec (6.12.12):

gentype fmod (gentype x, gentype y) => Modulus. Returns x – y * trunc (x/y).gentype remainder (gentype x, gentype y) => Compute the value r such

that r = x - n*y, where n

is the integer nearest the exact value of x/y. If there

are two integers closest to x/y, n shall be the even

one. If r is zero, it is given the same sign as x.Do you happen to know which behavior the frem instruction gives us?

Truncate or Round half to nearest even? I’m guessing that one of

these will be able to use the frem instruction, and the other won’t,

but I haven’t checked which is which yet.There is both __builtin_fmod(f), and __builtin_remainder(f), but I

haven’t found any documentation on them, or code outside of

Basic/Builtins.defThat’s because these are libm functions(same thing with

_builtin[sin|cos|tan|etc] and many of the other trig functions), and

there is no libm implementation that exists for R600… so calling

__builtin_modf just leads to invalid function calls and a segfault…It’s all well and good to have a built-in function in clang for most

of the math functions, but if the function isn’t really built-in and

is dependent upon an external architecture-specific library, then we

either need to:

find a way to port the libm functions to CL C (which is potentially

difficult… most of the float-precision functions assume that the

device can at least support doubles at lower performance)create an R600 implementation of libm

re-write the functions ourselves.

I’ve been trying to stick to CLC implementations of the functions

where possible to keep the implementation as architecture neutral as I

can (and to keep the libclc library as self-contained as possible).I can refactor this to attempt to use __builtin_fmod on architectures

where this function is expected to be available with either a CLC or

bitcode override for R600, or if the frem instruction matches the

required behavior, I’ll just use that for all architectures (if I

can’t use it here, then maybe we can use it for remainder)… It’s

just a bit of extra work.

I’ll try to post my patch implementing frem for R600 later today, although I don’t have access to hardware right now to test it on

I’ve pushed the R600 implementation of frem, so you should be able to see if it works now. The double version should fail any tests since double fdiv is implemented incorrectly until some operand legalization bugs are fixed