fast_length and fast_normalize

Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Thanks,

Jeroen

Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Can fast_length and fast_normalize be implemented correctly without
half_sqrt and half_rsqrt? Are there llvm intrinsics that we could
use for half_sqrt and half_rsqrt?

-Tom

Hi,

Hi all,

I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?

Can fast_length and fast_normalize be implemented correctly without
half_sqrt and half_rsqrt?

The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.

Are there llvm intrinsics that we could
use for half_sqrt and half_rsqrt?

The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.

Jeroen

Hi,

>>
>> Hi all,
>>
>> I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
>
> Can fast_length and fast_normalize be implemented correctly without
> half_sqrt and half_rsqrt?

The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.

> Are there llvm intrinsics that we could
> use for half_sqrt and half_rsqrt?

The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.

I think the sqrt.approx and rsqrt.approx intrinsics are intended
to be used with the native_sqrt and native_rsqrt functions.

For a generic implementation, you may be able to do something with the
llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
intrinsics.

-Tom

For a generic implementation, you may be able to do something with the
llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
intrinsics.

The only way I can see doing something with these intrinsics in the following

%2 = call f32 @llvm.convert.to.fp16(f32 %1)
%3 = call f32 @llvm.convert.from.fp16(i16 %2)
%4 = call f32 @llvm.sqrt.f32(f32 %3)
%5 = call f32 @llvm.convert.to.fp16(f32 %4)
%6 = call f32 @llvm.convert.from.fp16(i16 %5)

Which seems a bit roundabout.

Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?

Jeroen

> For a generic implementation, you may be able to do something with the
> llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
> intrinsics.

The only way I can see doing something with these intrinsics in the following

%2 = call f32 @llvm.convert.to.fp16(f32 %1)
%3 = call f32 @llvm.convert.from.fp16(i16 %2)
%4 = call f32 @llvm.sqrt.f32(f32 %3)
%5 = call f32 @llvm.convert.to.fp16(f32 %4)
%6 = call f32 @llvm.convert.from.fp16(i16 %5)

Which seems a bit roundabout.

Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?

I think it would be best to implement half_sqrt and half_rsqrt too.
Using the intrinsic may be roundabout, but it is correct and targets may
override the generic implementation if they want to.

-Tom