[PATCH 0/2] More trig builtins

Aaron_Watry · September 4, 2014, 5:35pm

I've implemented asin in terms of acos (which was sent to the list
a few days ago).

Tangent is implemented using a sin and a square root instead of sin(x)/cos(x).
sin(x)/cos(x) which produces much more verbose assembly than using the sqrt.

That being said, I am not sure if there's a better way to implement tan(x)
while still keeping the required precision. If someone has a better option,
I'm all ears. This implementation passes the piglit unit tests, at least.

I haven't checked if llvm.sin and llvm.cos intrinsics have enough precision for
float when used together (they didn't for just calculating sin/cos, so I figured
using both intrinsics together would just increase the error).

Aaron_Watry · September 4, 2014, 5:35pm

asin(x) = PI/2 - acos(x)

We already have an implementation of acos(x), so just use that.

Signed-off-by: Aaron Watry <awatry@gmail.com>

Aaron_Watry · September 4, 2014, 5:35pm

Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.

Signed-off-by: Aaron Watry <awatry@gmail.com>

jvesely · September 5, 2014, 5:54pm

asin(x) = PI/2 - acos(x)

LGTM.

just out of curiosity.
How does the precision compare to just using
atan2(x, ( sqrt(1-x^2) ) )
from 5) of your acos patch?

I assume (PI/2 -) does not shift the balance.

jan

jvesely · September 5, 2014, 5:57pm

Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

this is weird. both EG and SI have both sin and cos instructions. Is the
input normalization code so bad that we are better of doing MUL+SUB+SQRT
instead?

arsenm · September 5, 2014, 6:26pm

Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

this is weird. both EG and SI have both sin and cos instructions. Is the
input normalization code so bad that we are better of doing MUL+SUB+SQRT
instead?

Those are only useful for native_sin / native_cos. For the standard function, they are far from precise enough. The current (float) sin implementation should be correct, though native_sin right now is still defined to just be the regular sin function instead of the LLVM intrinsic

jvesely · September 5, 2014, 11:19pm

>> Uses the algorithm:
>> tan(x) = sin(x) / sqrt(1-sin^2(x))
>>
>> An alternative is:
>> tan(x) = sin(x) / cos(x)
>>
>> Which produces more verbose bitcode and longer assembly.
>
> this is weird. both EG and SI have both sin and cos instructions. Is the
> input normalization code so bad that we are better of doing MUL+SUB+SQRT
> instead?

Those are only useful for native_sin / native_cos. For the standard
function, they are far from precise enough. The current (float) sin
implementation should be correct, though native_sin right now is still
defined to just be the regular sin function instead of the LLVM
intrinsic

oh I didn't know the hw implementaion was so imprecise. In that case it
makes sense. Although I wonder why it ended up needing twice as many
instructions. it looks to me that sin and cos don't differ in more than
4 operations, so CSE should have eliminated most of it.
either way it's not going to be more efficient than this patch.

LGTM

Aaron_Watry · September 8, 2014, 3:09pm

asin(x) = PI/2 - acos(x)

LGTM.

just out of curiosity.
How does the precision compare to just using
atan2(x, ( sqrt(1-x^2) ) )
from 5) of your acos patch?

I assume (PI/2 -) does not shift the balance.

The precision of both implementations looks ok. The existing piglit
tests pass when tightened down to a tolerance of 1 ULP and fail at 0
ULP. Given that the spec gives us 4 ULP as allowed variance, it seems
like we're good.

I did the following, alternate implementations and did a quick check
on bitcode length and number of instructions on evergreen. It seems
like the second variation gives us sufficient precision and fewer
hardware instructions for at least the tested architecture (CEDAR on
latest svn llvm/clang).

If you prefer, I can commit the second implementation instead.

--Aaron

diff --git a/generic/lib/math/asin.inc b/generic/lib/math/asin.inc
index f1a65b3..661663a 100644
--- a/generic/lib/math/asin.inc
+++ b/generic/lib/math/asin.inc
@@ -5,7 +5,15 @@
#endif

_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE asin(__CLC_GENTYPE x) {
+#if 0
+ //Passes with 1ulp on evergreen, fails at 0
+ //(float16): 1786 DW on CEDAR, 22 GPRs, %694 is highest numbered
bitcode instr
return ( (__CLC_GENTYPE)PI2 - acos(x));
+#else
+ //Passes with 1ulp on evergreen, fails at 0
+ //(float16): 1622 DW on CEDAR, 22 GPRs, %691 is highest numbered
bitcode instr
+ return atan2(x, sqrt((__CLC_GENTYPE)1.0 -(x*x) ) );
+#endif
}

#undef PI2

jvesely · September 8, 2014, 9:35pm

>> asin(x) = PI/2 - acos(x)
>
> LGTM.
>
> just out of curiosity.
> How does the precision compare to just using
> atan2(x, ( sqrt(1-x^2) ) )
> from 5) of your acos patch?
>
> I assume (PI/2 -) does not shift the balance.
>

The precision of both implementations looks ok. The existing piglit
tests pass when tightened down to a tolerance of 1 ULP and fail at 0
ULP. Given that the spec gives us 4 ULP as allowed variance, it seems
like we're good.

I did the following, alternate implementations and did a quick check
on bitcode length and number of instructions on evergreen. It seems
like the second variation gives us sufficient precision and fewer
hardware instructions for at least the tested architecture (CEDAR on
latest svn llvm/clang).

If you prefer, I can commit the second implementation instead.

I'm ok with both, I'll leave the decision to you.

jan

Topic		Replies	Views
[PATCH 2/2] math: Add tan implementation OpenCL	0	75	September 5, 2014
[PATCH 1/4] Implement atan builtin OpenCL	13	87	September 2, 2014
[PATCH 1/1] math.h: Set HAVE_HW_FMA32 based on compiler provided macro OpenCL	2	76	January 29, 2018
[PATCH v2 1/2] tan: Port from amd_builtins OpenCL	2	118	January 19, 2018
[PATCH] libclc/asin: Switch to amd builtins version of asin OpenCL	3	314	February 4, 2020

[PATCH 0/2] More trig builtins

Related Topics