Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)

Hi Tom, Matt,

I'm running into strange issues with the cos test (piglit
generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)

I have been seeing random failures (incorrect results) for some time and
tried to investigate. the weird part is that the failures are not 100%
reproducible, sometimes the tests pass, or partly pass
(it's usually float8 and float16 subtests that fail).
Failure is always the same
"Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)"
although the position may vary. even if the same value was computed earlier in the results array

The first patch of this series does not change the behavior (or instruction dump).
however, using the ADDC instruction results in hang on every cos test
"ring 0 stalled for more than 10000msec"
"GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"

although the actual test results follow the same result as before (random failures mostly in float8/16 tests).
I can even get test pass with hang on every subtest

Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs,
and makes the failures fully reproducible in every subtest, triggered on the first
occurrence of what should have been -0.925879.

the GPU is AMD TURKS (HD 7570 1002:675d)

I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst.
Is there any errata document the might give a hint?

thanks,
jan

PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch

dumps.tgz (1.11 MB)

Hi Tom, Matt,

I'm running into strange issues with the cos test (piglit
generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)

I have been seeing random failures (incorrect results) for some time and
tried to investigate. the weird part is that the failures are not 100%
reproducible, sometimes the tests pass, or partly pass
(it's usually float8 and float16 subtests that fail).
Failure is always the same
"Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)"
although the position may vary. even if the same value was computed earlier in the results array

The first patch of this series does not change the behavior (or instruction dump).
however, using the ADDC instruction results in hang on every cos test
"ring 0 stalled for more than 10000msec"
"GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"

although the actual test results follow the same result as before (random failures mostly in float8/16 tests).
I can even get test pass with hang on every subtest

Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs,
and makes the failures fully reproducible in every subtest, triggered on the first
occurrence of what should have been -0.925879.

the GPU is AMD TURKS (HD 7570 1002:675d)

I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst.
Is there any errata document the might give a hint?

It's possible the bug is somewhere else and adding the addc
instruction changed the program enough to uncover it. Try modify the
packetizer to only allow one instruction per group. I think modifying
R600Packetizer::isLegalToPacketizeTogether() to always return false will
do this.

If you still get lockups even with this, the we can rule out some kind of packetizer bug.

-Tom

> Hi Tom, Matt,
>
> I'm running into strange issues with the cos test (piglit
> generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)
>
> I have been seeing random failures (incorrect results) for some time and
> tried to investigate. the weird part is that the failures are not 100%
> reproducible, sometimes the tests pass, or partly pass
> (it's usually float8 and float16 subtests that fail).
> Failure is always the same
> "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)"
> although the position may vary. even if the same value was computed earlier in the results array
>
> The first patch of this series does not change the behavior (or instruction dump).
> however, using the ADDC instruction results in hang on every cos test
> "ring 0 stalled for more than 10000msec"
> "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"
>
> although the actual test results follow the same result as before (random failures mostly in float8/16 tests).
> I can even get test pass with hang on every subtest
>
> Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs,
> and makes the failures fully reproducible in every subtest, triggered on the first
> occurrence of what should have been -0.925879.
>
> the GPU is AMD TURKS (HD 7570 1002:675d)
>
> I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst.
> Is there any errata document the might give a hint?

It's possible the bug is somewhere else and adding the addc
instruction changed the program enough to uncover it. Try modify the
packetizer to only allow one instruction per group. I think modifying
R600Packetizer::isLegalToPacketizeTogether() to always return false will
do this.

Thanks, this helped a lot. returning false fixes both incorrect results
and hangs. playing around with the packetizer I found that both hang and
incorrect results are caused by FMA incorrectly moved to Trans slot.
patch sent.

thanks,
jan