[PATCH 1/4] pow: Port from amd_builtins

Passes piglit on turks and carrizo
fp64 passes CTS on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Passes piglit on turks and carrizo
fp64 passes CTS on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Passes piglit on turks and carrizo
fp64 passes cts on carrizo

v2: fix formatting
    check fp32 denormal support at runtime

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Passes piglit on turks and carrizo
fp64 passes ctx on carrizo

v2: check fp32 denormal support at runtime

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Hi Jan,

The indentation still seems wrong in the rootn case. Otherwise, no more comments from
my side.

Jeroen

Hi Jan,

The indentation still seems wrong in the rootn case. Otherwise, no more comments from
my side.

fixed locally. I assume your LGTM from v1 still applies?

Jan

Hi Jan,

The indentation still seems wrong in the rootn case. Otherwise, no more comments from
my side.

fixed locally. I assume your LGTM from v1 still applies?

Indeed.

It seems my other answer got stuck in the moderator queue, but the bottom line of that email
is that I agree the denormal issues are better addressed separately.

Jeroen

Hi Jan,

Triggered by your patch below, I started looking a bit more closely at __clc_fp32_subnormals_supported.
Am I correct that this function currently always returns false no matter the target? Or does some
override occur somewhere?

Thanks,

Jeroen

Hi Jan,

Triggered by your patch below, I started looking a bit more closely at __clc_fp32_subnormals_supported.
Am I correct that this function currently always returns false no matter the target? Or does some
override occur somewhere?

Looks like it. Tom might be better qualified to answer this.
The original idea was to link default or disabled to select proper
operation at kernel link time, but it looks likes it only applies to
fp64.
That is weird since fp64 denormals are required by specs. I'd say we
can drop the entire mechanism.

Jan

Hi Jan,

Triggered by your patch below, I started looking a bit more closely at __clc_fp32_subnormals_supported.
Am I correct that this function currently always returns false no matter the target? Or does some
override occur somewhere?

Looks like it. Tom might be better qualified to answer this.
The original idea was to link default or disabled to select proper
operation at kernel link time, but it looks likes it only applies to
fp64.
That is weird since fp64 denormals are required by specs. I'd say we
can drop the entire mechanism.

That’s not totally true. Check out the cl-denorms-are-zero option.

Jeroen

>
> > Hi Jan,
> >
> > Triggered by your patch below, I started looking a bit more closely at __clc_fp32_subnormals_supported.
> > Am I correct that this function currently always returns false no matter the target? Or does some
> > override occur somewhere?
>
> Looks like it. Tom might be better qualified to answer this.
> The original idea was to link default or disabled to select proper
> operation at kernel link time, but it looks likes it only applies to
> fp64.
> That is weird since fp64 denormals are required by specs. I'd say we
> can drop the entire mechanism.

That’s not totally true. Check out the cl-denorms-are-zero option.

sorry, this one slipped under my radar. You're right. The mechanism was
aimed at AMD gpus that only care about fp64 denormals (fp32 denorms are
preferred disabled for performance/compliance reasons).
I don't mind someone reworking it to be more general (and idealy in CLC
instead of llvm ir).

regards,
Jan

Hi Jan,

Triggered by your patch below, I started looking a bit more closely at __clc_fp32_subnormals_supported.
Am I correct that this function currently always returns false no matter the target? Or does some
override occur somewhere?

Looks like it. Tom might be better qualified to answer this.
The original idea was to link default or disabled to select proper
operation at kernel link time, but it looks likes it only applies to
fp64.
That is weird since fp64 denormals are required by specs. I’d say we
can drop the entire mechanism.

That’s not totally true. Check out the cl-denorms-are-zero option.

sorry, this one slipped under my radar. You’re right. The mechanism was
aimed at AMD gpus that only care about fp64 denormals (fp32 denorms are
preferred disabled for performance/compliance reasons).
I don’t mind someone reworking it to be more general (and idealy in CLC
instead of llvm ir).

Thanks for the clarification. In that light the implementation makes sense to me.

Jeroen