Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Thanks,

Jeroen

Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Thanks,

Jeroen

Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Can fast_length and fast_normalize be implemented correctly without

half_sqrt and half_rsqrt? Are there llvm intrinsics that we could

use for half_sqrt and half_rsqrt?

-Tom

Hi,

Hi all,

Can fast_length and fast_normalize be implemented correctly without

half_sqrt and half_rsqrt?

The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.

Are there llvm intrinsics that we could

use for half_sqrt and half_rsqrt?

The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.

Jeroen

Hi,

>>

>> Hi all,

>>

>> I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

>

> Can fast_length and fast_normalize be implemented correctly without

> half_sqrt and half_rsqrt?The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.

> Are there llvm intrinsics that we could

> use for half_sqrt and half_rsqrt?The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.

I think the sqrt.approx and rsqrt.approx intrinsics are intended

to be used with the native_sqrt and native_rsqrt functions.

For a generic implementation, you may be able to do something with the

llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16

intrinsics.

-Tom

For a generic implementation, you may be able to do something with the

llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16

intrinsics.

The only way I can see doing something with these intrinsics in the following

%2 = call f32 @llvm.convert.to.fp16(f32 %1)

%3 = call f32 @llvm.convert.from.fp16(i16 %2)

%4 = call f32 @llvm.sqrt.f32(f32 %3)

%5 = call f32 @llvm.convert.to.fp16(f32 %4)

%6 = call f32 @llvm.convert.from.fp16(i16 %5)

Which seems a bit roundabout.

Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?

Jeroen

> For a generic implementation, you may be able to do something with the

> llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16

> intrinsics.The only way I can see doing something with these intrinsics in the following

%2 = call f32 @llvm.convert.to.fp16(f32 %1)

%3 = call f32 @llvm.convert.from.fp16(i16 %2)

%4 = call f32 @llvm.sqrt.f32(f32 %3)

%5 = call f32 @llvm.convert.to.fp16(f32 %4)

%6 = call f32 @llvm.convert.from.fp16(i16 %5)Which seems a bit roundabout.

Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?

I think it would be best to implement half_sqrt and half_rsqrt too.

Using the intrinsic may be roundabout, but it is correct and targets may

override the generic implementation if they want to.

-Tom