X86 rsqrt instruction generated

Hi,

We have implemented the rsqrt instruction generation for X86 target architecture. We have introduced a flag -fp-rsqrt flag which controls the generatation of X86 rsqrt instruction generation.

We have observed minor effects on precision due to rsqrt and hence has put these transformations under the mentioned flag.

Note that –fp-rsqrt is only enabled with -enable-unsafe-fp-math flag presently.

Moreover we have achieved some derived optimizations along with rsqrt generations.

Following is the details of the -fp-rsqrt flag along with its values and enabled optimizations.

-fp-rsqrt

=off - No rsqrt

=on - y/sqrt(x) => y * rsqrt(x) // Standard

=advance - Standard, sqrt(x) => x * rsqrt(x) // Advance

=fda - Advance, Derive FMA i.e. y/sqrt(x) +z => y * rsqrt(x) + z => vfmaddss y rsqrt(x) z.

This is termed as FDA(Fused Division Accumulation)

Sending the code patch(onto the svn revision 167927), text description and testcases attached with this mail.

Also we want to commit these changes back to llvm codebase. Please review and suggest.

Future enhance plans are as follows.

TODO:

  1. Enable vector rsqrt generation.

  2. Generate different variations of FDA i.e. FMSUB, FNMSUB,FNMADD instruction generations as required.

Best Regards,

soham

"The search for truth is more precious than its possession."

rsqrt_167927.patch (15.1 KB)

rsqrt-advance.ll (667 Bytes)

rsqrt-description.txt (3.91 KB)

rsqrt-fda.ll (984 Bytes)

rsqrt-on.ll (718 Bytes)

We already have a way to indicate the expected accuracy of
floating-point operations; see
http://llvm.org/docs/LangRef.html#fpmath . Please use that rather
than adding more options; floating-point math is complicated enough
without adding unnecessary options.

We also already have ways to tell CodeGen to form FMA instructions; if
those aren't working for you, the answer is to fix them, not to add
special-case for the case where one of the operands happens to be an
rsqrt.

The code you're adding isn't in the right place; it should probably be
a target-specific DAGCombine, somewhere in X86ISelLowering.cpp.

Also, please take the time to read http://llvm.org/docs/CodingStandards.html .

-Eli

Hi,

Please find attached the modified patch and description. We have modified and retested the patch taking into consideration the comments and inputs provided earlier.

Thanks & Regards,
soham

rsqrt_modified_167927.patch (12 KB)

rsqrt-description.txt (3.42 KB)

This is much closer to an acceptable patch.

You still haven't addressed my concern about using the fpmath metadata
(http://llvm.org/docs/LangRef.html#fpmath-metadata) to drive this
instead of a command-line option.

The special-case for 1.0/sqrt(x) (as opposed to y/sqrt(x)) appears to
be unnecessary. Also, I'm a bit concerned that if the DAGCombine
visits the FSQRT before the FDIV, we'll end up with weird sequences
like 1.0/(x*rsqrt(x)).

The indentation of the code appears weird in a few places.

Please include regression tests.

-Eli