Hello,

In libclc many arithmetic functions like pow, rsqrt are lowered to

corresponding llvm intrinsics. These are not handled correctly by the

NVPTX backend.

Where should this be fixed? NVPTX-Backend or libclc?

Or is this expected to work and I am just not using it correctly?

Christoph Gerum

Hi Christoph,

In libclc many arithmetic functions like pow, rsqrt are lowered to

corresponding llvm intrinsics. These are not handled correctly by the

NVPTX backend.

The standard llvm intrinsics do not guarantee the precision required by

the OpenCL standard, so they should be replaced eventually. Or, only

used for the native_ functions from the OpenCL library; in which case

the backend should support them.

Where should this be fixed? NVPTX-Backend or libclc?

libclc and there are two possible solutions:

* Either completely hand-rolled versions are introduced in libclc, see

e.g. Aaron’s patch from yesterday.

* nvptx specific intrinsics should be used that do have the correct

precision (if such intrinsics exist)

Jeroen

Hi Christoph,

In libclc many arithmetic functions like pow, rsqrt are lowered to

corresponding llvm intrinsics. These are not handled correctly by the

NVPTX backend.

The standard llvm intrinsics do not guarantee the precision required by

the OpenCL standard, so they should be replaced eventually. Or, only

used for the native_ functions from the OpenCL library; in which case

the backend should support them.

Where should this be fixed? NVPTX-Backend or libclc?

libclc and there are two possible solutions:

* Either completely hand-rolled versions are introduced in libclc, see

e.g. Aaron’s patch from yesterday.

Also, if you go the route of hand-rolling a function based on either

what AMD released recently-ish, or on the libm equivalents, make sure

that the implementation doesn't assume double support. Many math

library implementations (e.g. libm) use doubles to calculate a more

precise float result, which means that most of the reference

implementations that are in use can't be used on things like the

Radeon 5400 I occasionally develop on.

--Aaron