Hi all,
I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
Thanks,
Jeroen
Hi all,
I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
Thanks,
Jeroen
Hi all,
I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
Can fast_length and fast_normalize be implemented correctly without
half_sqrt and half_rsqrt? Are there llvm intrinsics that we could
use for half_sqrt and half_rsqrt?
-Tom
Hi,
Hi all,
I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
Can fast_length and fast_normalize be implemented correctly without
half_sqrt and half_rsqrt?
The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.
Are there llvm intrinsics that we could
use for half_sqrt and half_rsqrt?
The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.
Jeroen
Hi,
>>
>> Hi all,
>>
>> I was wondering: Would it make sense to provide implementations of fast_length and fast_normailze even though currently no implementations of half_sqrt and half_rsqrt are provided by libclc?
>
> Can fast_length and fast_normalize be implemented correctly without
> half_sqrt and half_rsqrt?The OpenCL specification says that the result should be equal to something that involves half_sqrt and half_rsqrt, respectively. So, it seems to make most sense to use the definitions given by OpenCL directly.
> Are there llvm intrinsics that we could
> use for half_sqrt and half_rsqrt?The nvptx back-end has the sqrt.approx and rsqrt.approx intrinsics, but it’s not clear to me whether these have enough precision. Also this isn’t a solution for the r600 back-end.
I think the sqrt.approx and rsqrt.approx intrinsics are intended
to be used with the native_sqrt and native_rsqrt functions.
For a generic implementation, you may be able to do something with the
llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
intrinsics.
-Tom
For a generic implementation, you may be able to do something with the
llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
intrinsics.
The only way I can see doing something with these intrinsics in the following
%2 = call f32 @llvm.convert.to.fp16(f32 %1)
%3 = call f32 @llvm.convert.from.fp16(i16 %2)
%4 = call f32 @llvm.sqrt.f32(f32 %3)
%5 = call f32 @llvm.convert.to.fp16(f32 %4)
%6 = call f32 @llvm.convert.from.fp16(i16 %5)
Which seems a bit roundabout.
Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?
Jeroen
> For a generic implementation, you may be able to do something with the
> llvm.sqrt.* and the llvm.convert.to.fp16 / llvm.convert.from.fp16
> intrinsics.The only way I can see doing something with these intrinsics in the following
%2 = call f32 @llvm.convert.to.fp16(f32 %1)
%3 = call f32 @llvm.convert.from.fp16(i16 %2)
%4 = call f32 @llvm.sqrt.f32(f32 %3)
%5 = call f32 @llvm.convert.to.fp16(f32 %4)
%6 = call f32 @llvm.convert.from.fp16(i16 %5)Which seems a bit roundabout.
Which brings me back to my original question: Would it make sense to provide an implementation of fast_length and fast_normalize even though there are no implementations (but only prototypes) for half_sqrt and half_rsqrt?
I think it would be best to implement half_sqrt and half_rsqrt too.
Using the intrinsic may be roundabout, but it is correct and targets may
override the generic implementation if they want to.
-Tom