NVPTX-backend and libclc arithmetic functions

Hello,

In libclc many arithmetic functions like pow, rsqrt are lowered to
corresponding llvm intrinsics. These are not handled correctly by the
NVPTX backend.

Where should this be fixed? NVPTX-Backend or libclc?
Or is this expected to work and I am just not using it correctly?

Christoph Gerum

Hi Christoph,

In libclc many arithmetic functions like pow, rsqrt are lowered to
corresponding llvm intrinsics. These are not handled correctly by the
NVPTX backend.

The standard llvm intrinsics do not guarantee the precision required by
the OpenCL standard, so they should be replaced eventually. Or, only
used for the native_ functions from the OpenCL library; in which case
the backend should support them.

Where should this be fixed? NVPTX-Backend or libclc?

libclc and there are two possible solutions:

* Either completely hand-rolled versions are introduced in libclc, see
  e.g. Aaron’s patch from yesterday.
* nvptx specific intrinsics should be used that do have the correct
  precision (if such intrinsics exist)

Jeroen

Hi Christoph,

In libclc many arithmetic functions like pow, rsqrt are lowered to
corresponding llvm intrinsics. These are not handled correctly by the
NVPTX backend.

The standard llvm intrinsics do not guarantee the precision required by
the OpenCL standard, so they should be replaced eventually. Or, only
used for the native_ functions from the OpenCL library; in which case
the backend should support them.

Where should this be fixed? NVPTX-Backend or libclc?

libclc and there are two possible solutions:

* Either completely hand-rolled versions are introduced in libclc, see
  e.g. Aaron’s patch from yesterday.

Also, if you go the route of hand-rolling a function based on either
what AMD released recently-ish, or on the libm equivalents, make sure
that the implementation doesn't assume double support. Many math
library implementations (e.g. libm) use doubles to calculate a more
precise float result, which means that most of the reference
implementations that are in use can't be used on things like the
Radeon 5400 I occasionally develop on.

--Aaron