Attached is the implementation of modf math builtin copied from AMD
builtin library.
This is my first patch ever so please be patient with review. My main
motivation was to get einstein@home binary pulsar search app working.
With this patch series the kernels build succesfully however the result
are wrong, there are probably some other problems (or I messed up).
I've done some casual testing and at least the float part seems to be
working properly, the fp64 part is totally untested.
Will piglit tests be needed to get this accepted? I had a look at the
gen_cl_math_builtins.py piglit script, however it seems I would have to
modify the framework to be able to test functions that have two outputs.
Sadly my python skills are rudimentary.
Pavel Ondračka (2):
Add _CLC_V_V_VP_VECTORIZE macro
Implement modf builtin
Attached is the implementation of modf math builtin copied from AMD
builtin library.
This is my first patch ever so please be patient with review. My main
motivation was to get einstein@home binary pulsar search app working.
With this patch series the kernels build succesfully however the result
are wrong, there are probably some other problems (or I messed up).
I've done some casual testing and at least the float part seems to be
working properly, the fp64 part is totally untested.
Will piglit tests be needed to get this accepted? I had a look at the
gen_cl_math_builtins.py piglit script, however it seems I would have to
modify the framework to be able to test functions that have two outputs.
Sadly my python skills are rudimentary.
For these kinds of patches I think it's best if somebody just runs the opencl conformance tests for you to verify them.
Pavel Ondračka (2):
Add _CLC_V_V_VP_VECTORIZE macro
Implement modf builtin
Does the pseudocode in the OpenCL documentation implementation for this function work?
gentype modf ( gentype value, gentype *iptr )
{
*iptr = trunc( value );
return copysign( isinf( value ) ? 0.0 : value - *iptr, value );
}
I would expect this to be faster on CI+ for fp64 due to the native trunc instruction, although this implementation is probably better for hardware without it. The start of this looks a bit like an inlined ftrunc
Tom (added to CC) posted changes for fract long time ago [0].
I still think that having external kernel (like modf.inc) would be
nicer than python generator, but the two outputs problem is addressed
there.
Attached is the implementation of modf math builtin copied from AMD
builtin library.
This is my first patch ever so please be patient with review. My main
motivation was to get einstein@home binary pulsar search app working.
With this patch series the kernels build succesfully however the result
are wrong, there are probably some other problems (or I messed up).
I've done some casual testing and at least the float part seems to be
working properly, the fp64 part is totally untested.
Will piglit tests be needed to get this accepted? I had a look at the
gen_cl_math_builtins.py piglit script, however it seems I would have to
modify the framework to be able to test functions that have two outputs.
Sadly my python skills are rudimentary.