Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.
Passes piglit (float) tests on pitcairn.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.
Passes piglit (float) tests on pitcairn.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Based on the amd-builtin, but explicitly vectorized for all sizes
(not just
float4), and includes a vectorized double implementation.
I'm not a big fan of copying bit magic from amd-builtins. In this case
it only avoids branch with the same number of instructions (amdgcn,
there are few more scalar instructions so the branch version might end
up being faster).
that said. I think the patch is OK with few comments to make it more
friendly for quick eyeballing. the code follows 'naive' implementation
pretty closely. with few comments:
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Passes piglit (float) tests on pitcairn.
Signed-off-by: Aaron Watry <awatry@gmail.com>
---
I did test the double implementation on my pitcairn as well, but
those
tests aren't in piglit.I just copied/pasted the generated float tests, enabled the fp64
pragma,
and did s/float/double in the test file.I hadn't planned on sending those to piglit unless someone really
wants them.
I agree, extending generators to generate double variants would be
preferable to introducing individual generated tests.
Jan
PS: sorry for a bit of a rant. I have attached my testing files if you
are interested.
fdim_test.tgz (3.05 KB)