[RFC] Improve iteration of estimating divisions

Qiu_Chaofan1 · August 6, 2019, 5:20am

Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le:

    float fdiv\(unsigned int a, unsigned int b\) \{
            return \(float\)a / \(float\)b;
    \}

Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened.

Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better.

Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.

Regards,
Qiu Chaofan

Neil_Nelson · August 6, 2019, 4:54pm

Qiu Chaofan,

Yes, clearly, two floating point operations instead of one will increase the degree of resulting error already present in the necessarily or commonly fixed length number representations.

The reason for the two operations appears to be that there may be machine instructions for a reciprocal that when combined with a multiplication obtains fewer machine cycles than a division.

The trade-off is then precision vs. speed. There may be additional computations along this line and perhaps an additional compile flag, along with code changes, would allow that choice.

Regards, Neil Nelson

Finkel_Hal_J · August 6, 2019, 8:04pm

Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le:
     float fdiv\(unsigned int a, unsigned int b\) \{
             return \(float\)a / \(float\)b;
     \}
Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened.

Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better.

Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.

Test cases can be changed if the result is universally better, and
alternatively, we can introduce a way for the target to control the
behavior (e.g., how we choose between buildSqrtNROneConst and
buildSqrtNRTwoConst). What's the effect on performance?

-Hal

Qiu_Chaofan1 · August 8, 2019, 4:47pm

Hal,

Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master:

float fdiv(unsigned int a, unsigned int b) {
  return (float)a / (float)b;
}

float m;

__attribute__((noinline)) void foo() {
  m = 0.0;
}

int main() {
  for (int i = 1; i < 1000000; ++i)
    for (int j = 1; j < 30000; ++j) {
      m = fdiv(i, j);
      foo();
    }
}

Regards,
Qiu Chaofan

Finkel_Hal_J · August 8, 2019, 4:58pm

I think that it’s certainly worth posting a patch and then we can evaluate it.

Thanks again,

Hal

Qiu_Chaofan1 · August 9, 2019, 9:34am

Hal,

Here is the patch. Thanks.

Regards,
Qiu Chaofan

recip-new.patch (3.23 KB)

Finkel_Hal_J · August 9, 2019, 4:34pm

Hi, Qiu Chaofan,

Can you please upload the patch to reviews.llvm.org? It’s much easier for me to review patches there (see https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface for instructions).

Thanks again,

Hal

Topic		Replies	Views
combineRepeatedFPDivisors design questions LLVM Dev List Archives	2	67	March 6, 2017
Clang ignoring --fast-math for complex division, serious performance hit Clang Frontend	17	143	November 10, 2017
RFC: Moving DAG heuristic-based transforms to MI passes LLVM Dev List Archives	4	89	January 30, 2017
"Splitting" Integer Divisions with Reused Divisor IR & Optimizations	2	367	October 31, 2024
Why my division is even faster than clang -O3? Beginners clang	0	125	October 31, 2024

[RFC] Improve iteration of estimating divisions

Related topics