We have implemented the rsqrt instruction generation for X86 target architecture. We have introduced a flag -fp-rsqrt flag which controls the generatation of X86 rsqrt instruction generation.
We have observed minor effects on precision due to rsqrt and hence has put these transformations under the mentioned flag.
Note that –fp-rsqrt is only enabled with -enable-unsafe-fp-math flag presently.
Moreover we have achieved some derived optimizations along with rsqrt generations.
Following is the details of the -fp-rsqrt flag along with its values and enabled optimizations.
=off - No rsqrt
=on - y/sqrt(x) => y * rsqrt(x) // Standard
=advance - Standard, sqrt(x) => x * rsqrt(x) // Advance
=fda - Advance, Derive FMA i.e. y/sqrt(x) +z => y * rsqrt(x) + z => vfmaddss y rsqrt(x) z.
This is termed as FDA(Fused Division Accumulation)
Sending the code patch(onto the svn revision 167927), text description and testcases attached with this mail.
Also we want to commit these changes back to llvm codebase. Please review and suggest.
Future enhance plans are as follows.
Enable vector rsqrt generation.
Generate different variations of FDA i.e. FMSUB, FNMSUB,FNMADD instruction generations as required.
"The search for truth is more precious than its possession."
rsqrt_167927.patch (15.1 KB)
rsqrt-advance.ll (667 Bytes)
rsqrt-description.txt (3.91 KB)
rsqrt-fda.ll (984 Bytes)
rsqrt-on.ll (718 Bytes)