[PATCH v2 1/1] rootn: Flush denormals if not supported.

jvesely · April 24, 2018, 4:31pm

It's OK to either flush to 0 or return denormal result if the device
does not support denormals. See sec 7.2 and 7.5.3 of OCL specs

v2: Use 0.0f explicitly intead of relying on GPU to flush it.

Fixes CTS on carrizo and turks
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

jvesely · April 30, 2018, 6:05pm

ping.

Aaron_Watry · May 2, 2018, 12:03pm

Am I being dense or just lucky (device supports denormals?).. This
already passed on my RX580 before I applied your patch

I'm currently rebuilding new newer llvm on my r600 box that hopefully
won't segfault when running rootn to test there.

--Aaron

jvesely · May 3, 2018, 1:51am

Am I being dense or just lucky (device supports denormals?).. This
already passed on my RX580 before I applied your patch.

IIRC, the problem is not with denormal support (unless you enabled it
explicitly), but that 'indx' variable was computed incorrectly. My
guess would be that one of the earlier operations (mad?) improved wrt
ULP precision (rootn still failed on my carrizo).
Anyway, flushing denormals just hides the issue. it'll probably still
fail if run with denormals enabled, but fixing denormal support is a
story for another day.

I'm currently rebuilding new newer llvm on my r600 box that hopefully
won't segfault when running rootn to test there.

thanks. It works OK on my turks when math_bruteforce is run in single
thread mode.

Jan

Aaron_Watry · May 3, 2018, 3:16am

> Am I being dense or just lucky (device supports denormals?).. This
> already passed on my RX580 before I applied your patch.

IIRC, the problem is not with denormal support (unless you enabled it
explicitly), but that 'indx' variable was computed incorrectly. My
guess would be that one of the earlier operations (mad?) improved wrt
ULP precision (rootn still failed on my carrizo).
Anyway, flushing denormals just hides the issue. it'll probably still
fail if run with denormals enabled, but fixing denormal support is a
story for another day.

> I'm currently rebuilding new newer llvm on my r600 box that
> hopefully
> won't segfault when running rootn to test there.

thanks. It works OK on my turks when math_bruteforce is run in single
thread mode.

Oh yeah, the compute memory pool on r600 isn't thread-safe...

Let's just say that the email I sent this morning was while the first
cup of coffee was still unconsumed, and I had a small child in my lap
trying to commandeer my mouse. Not a great time for deep thoughts.

--Aaron

jvesely · May 10, 2018, 6:16pm

> > Am I being dense or just lucky (device supports denormals?).. This
> > already passed on my RX580 before I applied your patch.
>
> IIRC, the problem is not with denormal support (unless you enabled it
> explicitly), but that 'indx' variable was computed incorrectly. My
> guess would be that one of the earlier operations (mad?) improved wrt
> ULP precision (rootn still failed on my carrizo).
> Anyway, flushing denormals just hides the issue. it'll probably still
> fail if run with denormals enabled, but fixing denormal support is a
> story for another day.
>
> > I'm currently rebuilding new newer llvm on my r600 box that
> > hopefully
> > won't segfault when running rootn to test there.
>
> thanks. It works OK on my turks when math_bruteforce is run in single
> thread mode.

Oh yeah, the compute memory pool on r600 isn't thread-safe...

Let's just say that the email I sent this morning was while the first
cup of coffee was still unconsumed, and I had a small child in my lap
trying to commandeer my mouse. Not a great time for deep thoughts.

Hi,

any luck running on your r600?

Jan

Aaron_Watry · May 10, 2018, 6:43pm

> > Am I being dense or just lucky (device supports denormals?).. This
> > already passed on my RX580 before I applied your patch.
>
> IIRC, the problem is not with denormal support (unless you enabled it
> explicitly), but that 'indx' variable was computed incorrectly. My
> guess would be that one of the earlier operations (mad?) improved wrt
> ULP precision (rootn still failed on my carrizo).
> Anyway, flushing denormals just hides the issue. it'll probably still
> fail if run with denormals enabled, but fixing denormal support is a
> story for another day.
>
> > I'm currently rebuilding new newer llvm on my r600 box that
> > hopefully
> > won't segfault when running rootn to test there.
>
> thanks. It works OK on my turks when math_bruteforce is run in single
> thread mode.

Oh yeah, the compute memory pool on r600 isn't thread-safe...

Let's just say that the email I sent this morning was while the first
cup of coffee was still unconsumed, and I had a small child in my lap
trying to commandeer my mouse. Not a great time for deep thoughts.

Hi,

any luck running on your r600?

Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
rootn test already passed with a max ULP of 1.0 before you patch, and
a max ULP of 7.0 after.

The tolerance for rootn is <= 16, so both cases passed, but the
maximum error seems to have gone up after flushing subnormals.

I've been staring at the patch off and on and trying to figure out if
it's doing something wrong. Maybe it's just difference in the
precision of the hardware we're using.

If we really need to, I've also got a cayman-based APU chip and a PCI
CEDAR if we want/need to get a few more sample points.

--Aaron

jvesely · May 10, 2018, 6:52pm

> > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > already passed on my RX580 before I applied your patch.
> > >
> > > IIRC, the problem is not with denormal support (unless you enabled it
> > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > guess would be that one of the earlier operations (mad?) improved wrt
> > > ULP precision (rootn still failed on my carrizo).
> > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > fail if run with denormals enabled, but fixing denormal support is a
> > > story for another day.
> > >
> > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > hopefully
> > > > won't segfault when running rootn to test there.
> > >
> > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > thread mode.
> >
> > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> >
> > Let's just say that the email I sent this morning was while the first
> > cup of coffee was still unconsumed, and I had a small child in my lap
> > trying to commandeer my mouse. Not a great time for deep thoughts.
>
> Hi,
>
> any luck running on your r600?

Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
rootn test already passed with a max ULP of 1.0 before you patch, and
a max ULP of 7.0 after.

The tolerance for rootn is <= 16, so both cases passed, but the
maximum error seems to have gone up after flushing subnormals.

I've been staring at the patch off and on and trying to figure out if
it's doing something wrong. Maybe it's just difference in the
precision of the hardware we're using.

If we really need to, I've also got a cayman-based APU chip and a PCI
CEDAR if we want/need to get a few more sample points.

hm, that's interesting. My problem with EG was that it returned NaN. My
guess would be there is a difference is in LLVM and how it handles
division/reciprocals.
Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Jan

Aaron_Watry · May 10, 2018, 8:43pm

> > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > already passed on my RX580 before I applied your patch.
> > >
> > > IIRC, the problem is not with denormal support (unless you enabled it
> > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > guess would be that one of the earlier operations (mad?) improved wrt
> > > ULP precision (rootn still failed on my carrizo).
> > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > fail if run with denormals enabled, but fixing denormal support is a
> > > story for another day.
> > >
> > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > hopefully
> > > > won't segfault when running rootn to test there.
> > >
> > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > thread mode.
> >
> > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> >
> > Let's just say that the email I sent this morning was while the first
> > cup of coffee was still unconsumed, and I had a small child in my lap
> > trying to commandeer my mouse. Not a great time for deep thoughts.
>
> Hi,
>
> any luck running on your r600?

Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
rootn test already passed with a max ULP of 1.0 before you patch, and
a max ULP of 7.0 after.

The tolerance for rootn is <= 16, so both cases passed, but the
maximum error seems to have gone up after flushing subnormals.

I've been staring at the patch off and on and trying to figure out if
it's doing something wrong. Maybe it's just difference in the
precision of the hardware we're using.

If we really need to, I've also got a cayman-based APU chip and a PCI
CEDAR if we want/need to get a few more sample points.

hm, that's interesting. My problem with EG was that it returned NaN. My
guess would be there is a difference is in LLVM and how it handles
division/reciprocals.
Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don't believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I'd have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I'm re-running all
3 in their current state right now.

--Aaron

jvesely · May 11, 2018, 12:02am

> > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > already passed on my RX580 before I applied your patch.
> > > > >
> > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > ULP precision (rootn still failed on my carrizo).
> > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > story for another day.
> > > > >
> > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > hopefully
> > > > > > won't segfault when running rootn to test there.
> > > > >
> > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > thread mode.
> > > >
> > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > >
> > > > Let's just say that the email I sent this morning was while the first
> > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > >
> > > Hi,
> > >
> > > any luck running on your r600?
> >
> > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > rootn test already passed with a max ULP of 1.0 before you patch, and
> > a max ULP of 7.0 after.
> >
> > The tolerance for rootn is <= 16, so both cases passed, but the
> > maximum error seems to have gone up after flushing subnormals.
> >
> > I've been staring at the patch off and on and trying to figure out if
> > it's doing something wrong. Maybe it's just difference in the
> > precision of the hardware we're using.
> >
> > If we really need to, I've also got a cayman-based APU chip and a PCI
> > CEDAR if we want/need to get a few more sample points.
>
> hm, that's interesting. My problem with EG was that it returned NaN. My
> guess would be there is a difference is in LLVM and how it handles
> division/reciprocals.
> Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don't believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I'd have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I'm re-running all
3 in their current state right now.

Actually my turks setup uses llvm-git. It's weird that you don't see
the NaN issues on you cedar.
I don't think you need to invest much time into this. Given the
manpower I think it's preferable to have one version that works across
many devices/llvm versions.
Improved precision is nice, but probably not something to sweat about.
It'd be more interesting to see if the explicit 0 allows the compiler
to generate faster code.

thanks,
Jan

Aaron_Watry · May 11, 2018, 2:19am

> > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > already passed on my RX580 before I applied your patch.
> > > > >
> > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > ULP precision (rootn still failed on my carrizo).
> > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > story for another day.
> > > > >
> > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > hopefully
> > > > > > won't segfault when running rootn to test there.
> > > > >
> > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > thread mode.
> > > >
> > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > >
> > > > Let's just say that the email I sent this morning was while the first
> > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > >
> > > Hi,
> > >
> > > any luck running on your r600?
> >
> > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > rootn test already passed with a max ULP of 1.0 before you patch, and
> > a max ULP of 7.0 after.
> >
> > The tolerance for rootn is <= 16, so both cases passed, but the
> > maximum error seems to have gone up after flushing subnormals.
> >
> > I've been staring at the patch off and on and trying to figure out if
> > it's doing something wrong. Maybe it's just difference in the
> > precision of the hardware we're using.
> >
> > If we really need to, I've also got a cayman-based APU chip and a PCI
> > CEDAR if we want/need to get a few more sample points.
>
> hm, that's interesting. My problem with EG was that it returned NaN. My
> guess would be there is a difference is in LLVM and how it handles
> division/reciprocals.
> Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don't believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I'd have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I'm re-running all
3 in their current state right now.

Actually my turks setup uses llvm-git. It's weird that you don't see
the NaN issues on you cedar.

I have a cedar, but it's currently in another system (An old DEC
Alpha). The R600-based card I usually test with is the BARTS. The last
r600 card I have is a 3-core Llano APU (Cayman-derived I believe,
SUMO2 chip), unless you want to include the chipset-based IGPs on a
few motherboards I've got.

Just for reference, I went back to the commit immediately before the
denormal fixes for pow/powr/pown on my BARTS, and all 3 fail wimpy
mode before the denormal fixes (in single-threaded mode). I haven't
bothered with a full non-wimpy run.

I don't think you need to invest much time into this. Given the
manpower I think it's preferable to have one version that works across
many devices/llvm versions.
Improved precision is nice, but probably not something to sweat about.
It'd be more interesting to see if the explicit 0 allows the compiler
to generate faster code.

Yeah, the test passes within allowed tolerances, and I don't want to
have multiple versions of the code unless there's a good reason to.

If this lets more chips pass without errors, I'm fine with this going in as-is.

--Aaron

jvesely · May 15, 2018, 2:59am

> > > > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > > > already passed on my RX580 before I applied your patch.
> > > > > > >
> > > > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > > > ULP precision (rootn still failed on my carrizo).
> > > > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > > > story for another day.
> > > > > > >
> > > > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > > > hopefully
> > > > > > > > won't segfault when running rootn to test there.
> > > > > > >
> > > > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > > > thread mode.
> > > > > >
> > > > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > > > >
> > > > > > Let's just say that the email I sent this morning was while the first
> > > > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > > > >
> > > > > Hi,
> > > > >
> > > > > any luck running on your r600?
> > > >
> > > > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > > > rootn test already passed with a max ULP of 1.0 before you patch, and
> > > > a max ULP of 7.0 after.
> > > >
> > > > The tolerance for rootn is <= 16, so both cases passed, but the
> > > > maximum error seems to have gone up after flushing subnormals.
> > > >
> > > > I've been staring at the patch off and on and trying to figure out if
> > > > it's doing something wrong. Maybe it's just difference in the
> > > > precision of the hardware we're using.
> > > >
> > > > If we really need to, I've also got a cayman-based APU chip and a PCI
> > > > CEDAR if we want/need to get a few more sample points.
> > >
> > > hm, that's interesting. My problem with EG was that it returned NaN. My
> > > guess would be there is a difference is in LLVM and how it handles
> > > division/reciprocals.
> > > Did the other pow (pow{,r,n}) routines also exhibit this behaviour?
> >
> > Not sure. I don't believe so (I believe I usually reproduced a CTS
> > failure for those before confirming the fix), but I'd have to go back
> > in time with libclc to check.
> >
> > For reference, the testing I did with rootn was done with a current
> > mesa checkout as of earlier today (d07466fe18522cde1) with
> > LLVM r331343 and libclc r331435 as a base.
> >
> > Would you like me to go back and re-check the pow/powr/pown results on
> > my 6850 from before the denormal flushing changes? I'm re-running all
> > 3 in their current state right now.
>
> Actually my turks setup uses llvm-git. It's weird that you don't see
> the NaN issues on you cedar.

I have a cedar, but it's currently in another system (An old DEC
Alpha). The R600-based card I usually test with is the BARTS. The last
r600 card I have is a 3-core Llano APU (Cayman-derived I believe,
SUMO2 chip), unless you want to include the chipset-based IGPs on a
few motherboards I've got.

Just for reference, I went back to the commit immediately before the
denormal fixes for pow/powr/pown on my BARTS, and all 3 fail wimpy
mode before the denormal fixes (in single-threaded mode). I haven't
bothered with a full non-wimpy run.

> I don't think you need to invest much time into this. Given the
> manpower I think it's preferable to have one version that works across
> many devices/llvm versions.
> Improved precision is nice, but probably not something to sweat about.
> It'd be more interesting to see if the explicit 0 allows the compiler
> to generate faster code.

Yeah, the test passes within allowed tolerances, and I don't want to
have multiple versions of the code unless there's a good reason to.

If this lets more chips pass without errors, I'm fine with this going in as-is.

thanks. May I consider it an acked-by?

Jan

Aaron_Watry · May 15, 2018, 3:29am

Am I being dense or just lucky (device supports denormals?)… This
already passed on my RX580 before I applied your patch.

IIRC, the problem is not with denormal support (unless you enabled it
explicitly), but that ‘indx’ variable was computed incorrectly. My
guess would be that one of the earlier operations (mad?) improved wrt
ULP precision (rootn still failed on my carrizo).
Anyway, flushing denormals just hides the issue. it’ll probably still
fail if run with denormals enabled, but fixing denormal support is a
story for another day.

I’m currently rebuilding new newer llvm on my r600 box that
hopefully
won’t segfault when running rootn to test there.

thanks. It works OK on my turks when math_bruteforce is run in single
thread mode.

Oh yeah, the compute memory pool on r600 isn’t thread-safe…

Let’s just say that the email I sent this morning was while the first
cup of coffee was still unconsumed, and I had a small child in my lap
trying to commandeer my mouse. Not a great time for deep thoughts.

Hi,

any luck running on your r600?

Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
rootn test already passed with a max ULP of 1.0 before you patch, and
a max ULP of 7.0 after.

The tolerance for rootn is <= 16, so both cases passed, but the
maximum error seems to have gone up after flushing subnormals.

I’ve been staring at the patch off and on and trying to figure out if
it’s doing something wrong. Maybe it’s just difference in the
precision of the hardware we’re using.

If we really need to, I’ve also got a cayman-based APU chip and a PCI
CEDAR if we want/need to get a few more sample points.

hm, that’s interesting. My problem with EG was that it returned NaN. My
guess would be there is a difference is in LLVM and how it handles
division/reciprocals.
Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don’t believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I’d have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I’m re-running all
3 in their current state right now.

Actually my turks setup uses llvm-git. It’s weird that you don’t see
the NaN issues on you cedar.

I have a cedar, but it’s currently in another system (An old DEC
Alpha). The R600-based card I usually test with is the BARTS. The last
r600 card I have is a 3-core Llano APU (Cayman-derived I believe,
SUMO2 chip), unless you want to include the chipset-based IGPs on a
few motherboards I’ve got.

Just for reference, I went back to the commit immediately before the
denormal fixes for pow/powr/pown on my BARTS, and all 3 fail wimpy
mode before the denormal fixes (in single-threaded mode). I haven’t
bothered with a full non-wimpy run.

I don’t think you need to invest much time into this. Given the
manpower I think it’s preferable to have one version that works across
many devices/llvm versions.
Improved precision is nice, but probably not something to sweat about.
It’d be more interesting to see if the explicit 0 allows the compiler
to generate faster code.

Yeah, the test passes within allowed tolerances, and I don’t want to
have multiple versions of the code unless there’s a good reason to.

If this lets more chips pass without errors, I’m fine with this going in as-is.

thanks. May I consider it an acked-by?

Yeah. Acked-by, tested-by. Either or both are fine with me.

jvesely · May 21, 2018, 10:49pm

> > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > already passed on my RX580 before I applied your patch.
> > > > >
> > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > ULP precision (rootn still failed on my carrizo).
> > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > story for another day.
> > > > >
> > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > hopefully
> > > > > > won't segfault when running rootn to test there.
> > > > >
> > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > thread mode.
> > > >
> > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > >
> > > > Let's just say that the email I sent this morning was while the first
> > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > >
> > > Hi,
> > >
> > > any luck running on your r600?
> >
> > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > rootn test already passed with a max ULP of 1.0 before you patch, and
> > a max ULP of 7.0 after.
> >
> > The tolerance for rootn is <= 16, so both cases passed, but the
> > maximum error seems to have gone up after flushing subnormals.
> >
> > I've been staring at the patch off and on and trying to figure out if
> > it's doing something wrong. Maybe it's just difference in the
> > precision of the hardware we're using.
> >
> > If we really need to, I've also got a cayman-based APU chip and a PCI
> > CEDAR if we want/need to get a few more sample points.
>
> hm, that's interesting. My problem with EG was that it returned NaN. My
> guess would be there is a difference is in LLVM and how it handles
> division/reciprocals.
> Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don't believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I'd have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I'm re-running all
3 in their current state right now.

Hi,

do divide and half_divide tests pass on you EG hw? I think that broken
division may explain why a special fix for rootn was necessary.

thanks,
Jan

Aaron_Watry · May 22, 2018, 2:46am

> > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > already passed on my RX580 before I applied your patch.
> > > > >
> > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > ULP precision (rootn still failed on my carrizo).
> > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > story for another day.
> > > > >
> > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > hopefully
> > > > > > won't segfault when running rootn to test there.
> > > > >
> > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > thread mode.
> > > >
> > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > >
> > > > Let's just say that the email I sent this morning was while the first
> > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > >
> > > Hi,
> > >
> > > any luck running on your r600?
> >
> > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > rootn test already passed with a max ULP of 1.0 before you patch, and
> > a max ULP of 7.0 after.
> >
> > The tolerance for rootn is <= 16, so both cases passed, but the
> > maximum error seems to have gone up after flushing subnormals.
> >
> > I've been staring at the patch off and on and trying to figure out if
> > it's doing something wrong. Maybe it's just difference in the
> > precision of the hardware we're using.
> >
> > If we really need to, I've also got a cayman-based APU chip and a PCI
> > CEDAR if we want/need to get a few more sample points.
>
> hm, that's interesting. My problem with EG was that it returned NaN. My
> guess would be there is a difference is in LLVM and how it handles
> division/reciprocals.
> Did the other pow (pow{,r,n}) routines also exhibit this behaviour?

Not sure. I don't believe so (I believe I usually reproduced a CTS
failure for those before confirming the fix), but I'd have to go back
in time with libclc to check.

For reference, the testing I did with rootn was done with a current
mesa checkout as of earlier today (d07466fe18522cde1) with
LLVM r331343 and libclc r331435 as a base.

Would you like me to go back and re-check the pow/powr/pown results on
my 6850 from before the denormal flushing changes? I'm re-running all
3 in their current state right now.

Hi,

do divide and half_divide tests pass on you EG hw? I think that broken
division may explain why a special fix for rootn was necessary.

For my 6850 (northern islands), the tests both fail with ULP errors:

80: half_divide
ERROR: half_divide: -nan ulp error at {-inf (0xff800000),
-0x1.fffffep+127 (0xff7fffff)}: *inf vs. -nan (0xffc00000) at index:
197
95: divide
ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs.
-nan (0xffc00000) at index: 197

I can pull my CEDAR (5400-series, actual evergreen card) from its
current home and test that as well, if you need/want me to.

--Aaron

jvesely · May 22, 2018, 3:10pm

> > > > > > > > Am I being dense or just lucky (device supports denormals?).. This
> > > > > > > > already passed on my RX580 before I applied your patch.
> > > > > > >
> > > > > > > IIRC, the problem is not with denormal support (unless you enabled it
> > > > > > > explicitly), but that 'indx' variable was computed incorrectly. My
> > > > > > > guess would be that one of the earlier operations (mad?) improved wrt
> > > > > > > ULP precision (rootn still failed on my carrizo).
> > > > > > > Anyway, flushing denormals just hides the issue. it'll probably still
> > > > > > > fail if run with denormals enabled, but fixing denormal support is a
> > > > > > > story for another day.
> > > > > > >
> > > > > > > > I'm currently rebuilding new newer llvm on my r600 box that
> > > > > > > > hopefully
> > > > > > > > won't segfault when running rootn to test there.
> > > > > > >
> > > > > > > thanks. It works OK on my turks when math_bruteforce is run in single
> > > > > > > thread mode.
> > > > > >
> > > > > > Oh yeah, the compute memory pool on r600 isn't thread-safe...
> > > > > >
> > > > > > Let's just say that the email I sent this morning was while the first
> > > > > > cup of coffee was still unconsumed, and I had a small child in my lap
> > > > > > trying to commandeer my mouse. Not a great time for deep thoughts.
> > > > >
> > > > > Hi,
> > > > >
> > > > > any luck running on your r600?
> > > >
> > > > Yes, in single-threaded mode. But in my case (HD 6850, BARTS) the
> > > > rootn test already passed with a max ULP of 1.0 before you patch, and
> > > > a max ULP of 7.0 after.
> > > >
> > > > The tolerance for rootn is <= 16, so both cases passed, but the
> > > > maximum error seems to have gone up after flushing subnormals.
> > > >
> > > > I've been staring at the patch off and on and trying to figure out if
> > > > it's doing something wrong. Maybe it's just difference in the
> > > > precision of the hardware we're using.
> > > >
> > > > If we really need to, I've also got a cayman-based APU chip and a PCI
> > > > CEDAR if we want/need to get a few more sample points.
> > >
> > > hm, that's interesting. My problem with EG was that it returned NaN. My
> > > guess would be there is a difference is in LLVM and how it handles
> > > division/reciprocals.
> > > Did the other pow (pow{,r,n}) routines also exhibit this behaviour?
> >
> > Not sure. I don't believe so (I believe I usually reproduced a CTS
> > failure for those before confirming the fix), but I'd have to go back
> > in time with libclc to check.
> >
> > For reference, the testing I did with rootn was done with a current
> > mesa checkout as of earlier today (d07466fe18522cde1) with
> > LLVM r331343 and libclc r331435 as a base.
> >
> > Would you like me to go back and re-check the pow/powr/pown results on
> > my 6850 from before the denormal flushing changes? I'm re-running all
> > 3 in their current state right now.
>
> Hi,
>
> do divide and half_divide tests pass on you EG hw? I think that broken
> division may explain why a special fix for rootn was necessary.

For my 6850 (northern islands), the tests both fail with ULP errors:

80: half_divide
ERROR: half_divide: -nan ulp error at {-inf (0xff800000),
-0x1.fffffep+127 (0xff7fffff)}: *inf vs. -nan (0xffc00000) at index:
197
95: divide
ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs.
-nan (0xffc00000) at index: 197

I can pull my CEDAR (5400-series, actual evergreen card) from its
current home and test that as well, if you need/want me to.

thanks, but there's no need. I knew the problem with denormals in thes
routines (powX, rootn) was in the division part of the algorithm. I
thought it might explain why it worked for you and not on my turks.
but it looks like division is broken is broken with extreme values just
the same.

thanks,
Jan

Topic		Replies	Views
[PATCH 1/3] pow: Use denormal path only OpenCL	4	115	April 18, 2018
[PATCH 1/2] rootn: Use denormal path only OpenCL	1	133	April 23, 2018
[PATCH 1/4] pow: Port from amd_builtins OpenCL	11	97	January 31, 2018
Questions about llvm.canonicalize IR & Optimizations	46	912	June 27, 2024
[PATCH 01/15] Fix implementation of normalize builtin OpenCL	36	99	February 2, 2016

[PATCH v2 1/1] rootn: Flush denormals if not supported.

Related topics