[PATCH 1/2] amdgcn/fmin: Explicitly check for NaNs

v_min instruction fails to handle certain NaNs correctly.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

v_max instruction fails to handle certain NaN values

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

This seems an incorrect fix. The llvm.minnum intrinsic requires the NaN behaviour you’re now trying to enforce through the library code. I think the proper way to fix this, is to adapt GCN codegen.

Jeroen

This seems an incorrect fix. The llvm.minnum intrinsic requires the
NaN behaviour you’re now trying to enforce through the library code.
I think the proper way to fix this, is to adapt GCN codegen.

I agree. This will be relegated to llvm-3.9/4/5 as soon as llvm-6
codegen is fixed.

Jan

What exact problem are you trying to solve? Do you have an example? NaNs should work correctly. We do have some issues with denorm flushing behavior changing on gfx9.

> v_min instruction fails to handle certain NaNs correctly.
>
> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
> ---
> Fixes most of fmin CTS on carrizo. It still fails on denormals, but IMO
> that's CTS bug.
> EG/NI does not seem to suffer the same problem.

What exact problem are you trying to solve? Do you have an example? NaNs
should work correctly.

they don't.
some nan encodings return incorrect value.
examples from CTS:
ERROR: fmin: -nan ulp error at {0x1.211a92p-95 (0x10108d49), -nan (0xffa66c5d)}: *0x1.211a92p-95 vs. -nan (0xffe66c5d) at index: 448

ERROR: fmin: nan ulp error at {nan (0x7fba759e), 0x1.94ea58p-89 (0x134a752c)}: *0x1.94ea58p-89 vs. nan (0x7ffa759e) at index: 450

ERROR: fmin: nan ulp error at {nan (0x7f9a4655), -0x1.9ed39p-22 (0xb4cf69c8)}: *-0x1.9ed39p-22 vs. nan (0x7fda4655) at index: 617

ERROR: fmin: nan ulp error at {0x1.303c36p+46 (0x56981e1b), nan (0x7fa7afae)}: *0x1.303c36p+46 vs. nan (0x7fe7afae) at index: 739

it only shows 4 errors because it used 4 testing threads, there might be more.
tested on carrizo (fx9800p).

Jan

> > v_min instruction fails to handle certain NaNs correctly.
> >
> > Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
> > ---
> > Fixes most of fmin CTS on carrizo. It still fails on denormals, but IMO
> > that's CTS bug.
> > EG/NI does not seem to suffer the same problem.
>
> What exact problem are you trying to solve? Do you have an example? NaNs
> should work correctly.

they don't.
some nan encodings return incorrect value.
examples from CTS:
ERROR: fmin: -nan ulp error at {0x1.211a92p-95 (0x10108d49), -nan (0xffa66c5d)}: *0x1.211a92p-95 vs. -nan (0xffe66c5d) at index: 448

ERROR: fmin: nan ulp error at {nan (0x7fba759e), 0x1.94ea58p-89 (0x134a752c)}: *0x1.94ea58p-89 vs. nan (0x7ffa759e) at index: 450

ERROR: fmin: nan ulp error at {nan (0x7f9a4655), -0x1.9ed39p-22 (0xb4cf69c8)}: *-0x1.9ed39p-22 vs. nan (0x7fda4655) at index: 617

ERROR: fmin: nan ulp error at {0x1.303c36p+46 (0x56981e1b), nan (0x7fa7afae)}: *0x1.303c36p+46 vs. nan (0x7fe7afae) at index: 739

it only shows 4 errors because it used 4 testing threads, there might be more.
tested on carrizo (fx9800p).

I wrote a small test to check all possible nan encodings [0].
The results are:
Platform is `AMD Radeon R7 Graphics (AMD CARRIZO / DRM 3.13.0 / 4.9.0-ROC-SC, LLVM 4.0.1)' by: AMD version: OpenCL 1.1 Mesa 17.2.4
Wrong1: 8388606/16777216
Platform is `AMD Radeon (TM) R7 M340 (AMD ICELAND / DRM 3.18.0 / 4.11.0-ROC-SC, LLVM 4.0.1)' by: AMD version: OpenCL 1.1 Mesa 17.2.4
Wrong1: 8388606/16777216

also
Platform is `AMD TURKS (DRM 2.50.0 / 4.13.11-200.fc26.x86_64, LLVM 6.0.0)' by: AMD version: OpenCL 1.1 Mesa 17.4.0-devel (git-edd7bd55ed)
Wrong1: 0/16777216

Jan

[0] https://github.com/jvesely/ocl_tests/blob/master/fmin/fmin.cpp

If I’m not mistaken these all change sNaNs into qNaNs. Looking at [1], this seems reminiscent of ieee_mode. Are you sure the chip is correctly set up before the kernels are run? In particular, the IEEE bit (bit 9) of the Mode Register?

Jeroen

[1] http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture1.pdf

Mmm,

From the LLVM IR documentation “Follows the IEEE-754 semantics for minNum, which also match for libm’s fmin.”

What does IEEE-754 mean here? 1985 or 2008? If 2008, then the statement is in correct, because libm treats sNaNs and qNaNs in the same way. I’m not sure about 1985 (don’t have access to that version of the spec at the moment).

Jeroen

Below:

s/correct/incorrect/

My apologies if that caused confusion.

Jeroen

The compiler always assumes IEEE mode is enabled for compute kernels. I’m not sure if the driver respects that or not.

The compiler always assumes IEEE mode is enabled for compute kernels.
I'm not sure if the driver respects that or not.

right, it sets ieee_mode = 1. but that's not what OpenCL expects for
minnum/maxnum.

"fmin and fmax behave as defined by C99 and may not match the IEEE 754-
2008 definition for minNum and
maxNum with regard to signaling NaNs. Specifically, signaling NaNs may
behave as quiet NaNs."

indeed, clearing the flag fixes the problem

so what would be the correct solution here.
1.) change llvm to not set ieee_mode for opencl compute?
2.) change clover to ignore/override llvm provided ieee_mode setting (I
assume this is passed in dispatch packet)?
3.) patch this in libclc? (we might have to do this for older llvm if
1. is the proper fix)

________________________________
From: Jeroen Ketema <j.ketema@xs4all.nl>
Sent: Thursday, November 16, 2017 4:46:12 PM
To: Jan Vesely
Cc: Arsenault, Matthew; libclc-dev@lists.llvm.org
Subject: Re: [Libclc-dev] [PATCH 1/2] amdgcn/fmin: Explicitly check
for NaNs

If I’m not mistaken these all change sNaNs into qNaNs. Looking
at [1], this seems reminiscent of ieee_mode. Are you sure the chip is
correctly set up before the kernels are run? In particular, the IEEE
bit (bit 9) of the Mode Register?

thanks, should have spotted it.

Jan

rocm-device-libs uses canonicalize on the inputs. I think it may only be necessary on the inputs. We also are missing optimizations to eliminate some redundant canonicalizes

what's the reason to keep the ieee mode?
GCN3 specs only describe semantic change for v_max_f32, v_min_f32, and
v_max_f64 (not even v_min_f64)
the flag description specifically mentions handling of snan which is
not required for OpenCL.

rocm libs work around the behaviour, OpenCL requires not to follow the
behaviour and GL compute does not care.
Can't we just drop it?

Jan

Below:

s/correct/incorrect/

My apologies if that caused confusion.

Jeroen

>
> Mmm,
>
> From the LLVM IR documentation “Follows the IEEE-754 semantics for
> minNum, which also match for libm’s fmin.”
>
> What does IEEE-754 mean here? 1985 or 2008? If 2008, then the
> statement is in correct, because libm treats sNaNs and qNaNs in the
> same way. I’m not sure about 1985 (don’t have access to that
> version of the spec at the moment).

llvm's fcanonicalize op, which among other things silences SNaNs should
be implementable using llvm.minnum(x, x) if the environment supports
SNaNs[0].
I think we need these patches if GCN backend insists on supporting
SNaNs for all compute kernels. It'd still be preferable to disable SNaN
support for OpenCL [1]. Pending the outcome of that patch I'll
restrict these patches to llvm 3.9/4.0/5.0.

Jan

[0] https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic
[1] https://reviews.llvm.org/D40514

Below:

s/correct/incorrect/

My apologies if that caused confusion.

Jeroen

Mmm,

From the LLVM IR documentation “Follows the IEEE-754 semantics for
minNum, which also match for libm’s fmin.”

What does IEEE-754 mean here? 1985 or 2008? If 2008, then the
statement is in correct, because libm treats sNaNs and qNaNs in the
same way. I’m not sure about 1985 (don’t have access to that
version of the spec at the moment).

llvm's fcanonicalize op, which among other things silences SNaNs should
be implementable using llvm.minnum(x, x) if the environment supports
SNaNs[0].

Ok, so that requires following the IEEE-754 semantics, because sNaNs
need to be turned into qNaNs.

On the other hand, the libm remark seems to suggest that you may drop
in libm’s version if there’s not hardware support.

It might be best to get this clarified on llvm-dev?

Jeroen

>
> > Below:
> >
> > s/correct/incorrect/
> >
> > My apologies if that caused confusion.
> >
> > Jeroen
> >
> > >
> > > Mmm,
> > >
> > > From the LLVM IR documentation “Follows the IEEE-754 semantics for
> > > minNum, which also match for libm’s fmin.”
> > >
> > > What does IEEE-754 mean here? 1985 or 2008? If 2008, then the
> > > statement is in correct, because libm treats sNaNs and qNaNs in the
> > > same way. I’m not sure about 1985 (don’t have access to that
> > > version of the spec at the moment).
>
> llvm's fcanonicalize op, which among other things silences SNaNs should
> be implementable using llvm.minnum(x, x) if the environment supports
> SNaNs[0].

Ok, so that requires following the IEEE-754 semantics, because sNaNs
need to be turned into qNaNs.

On the other hand, the libm remark seems to suggest that you may drop
in libm’s version if there’s not hardware support.

It might be best to get this clarified on llvm-dev?

since those operations were added by Matt, he can comment either in
this thread or on the llvm patch. I don't think we'd get more info on
llvm-dev.

Kan

I think we needs these patches for older LLVM. I'll add ifdef for llvm
7 when [0] (or similar solution) lands.

Jan
[0] https://reviews.llvm.org/D40514