Optimization of complex number division

s-watanabe314 · April 21, 2025, 9:30am

Thanks for the information! While it differs from that infomation, I also investigated the calculation accuracy and potential for overflow of the Smith algorithm.

LAPACK Tests

I looked at the complex division test in LAPACK (test_zcomplexdiv.f). I believe this test is related to the possibility of overflow and denormalized numbers. It was only a double-precision test, but I investigated it similarly for half, single, and quadruple precision. As a result of the investigation, the cases where overflow occurred were the same for the default runtime function and Smith’s algorithm using MLIR. Overflow occurred in the following four cases.

[ c3] X =  1.7976931348623157E+308 : (x+x*I)/(x+x*I) =                      NaN+0.0000000000000000E+000*I differs from +1.0000000000000000E+000
[ c4] X =  8.9884656743115795E+307 : (x+x*I)/(x+x*I) =                      NaN+0.0000000000000000E+000*I differs from +1.0000000000000000E+000
[ f3] X =  1.7976931348623157E+308 : (x+x*I)/(x-x*I) =  0.0000000000000000E+000                     NaN*I differs from +0.0000000000000000E+000+1.0000000000000000E+000*I
[ f4] X =  8.9884656743115795E+307 : (x+x*I)/(x-x*I) =  0.0000000000000000E+000                     NaN*I differs from +0.0000000000000000E+000+1.0000000000000000E+000*I

When using the Algebraic method, overflow occurs in even more cases. Specifically, if X is greater than 1.34E+154 (squaring it exceeds the maximum value for double-precision), overflow will occur during intermediate calculations. The Smith’s algorithm can avoid this type of overflow.

Precision Tests

For the calculation of z = x / y = (a + bi) / (c + di), I randomly set a, b, c, and d to values between -1.0 and 1.0, and compared the calculation results of each algorithm. After comparing 100,000 cases, the following results were obtained compared to runtime function of libgcc (the current Flang default):

Single Precision

Algorithm	Number of Cases with Precision Errors
compiler-rt	81,760
Smith	41,029
Algebraic	81,760

The Smith’s algorithm is closer to the calculation results of libgcc than the algebraic method, but there are still differences. The largest precision error occurred in the following example:

x_re = 0.687424004077911376953125
x_im = -0.6453526020050048828125
y_re = -0.81454432010650634765625
y_im = -0.86763536930084228515625

x = cmplx(x_re, x_im, kind=4)
y = cmplx(y_re, y_im, kind=4)

z = x / y
! libgcc's result:
! z_re = -0.465173843622324056923389434814453125E-4
! z_im = 0.792291581630706787109375E+0
! Smith's result:
! z_re = -0.463739297629217617213726043701171875E-4
! z_im = 0.792291581630706787109375E+0

In this example, the lower 15 bits of the mantissa of z_re differ. However, in most cases, the difference occurs in the lower 1 or 2 bits.

Double Precision

Algorithm	Number of Cases with Precision Errors
compiler-rt	81,605
Smith	40,598
Algebraic	81,605

The largest precision error occurs in the following case. The lower 24 bits of the mantissa of z_re differ.

x_re = 0.5312533058489978810001730380463413894176483154296875
x_im = -0.75302995788782245423220729207969270646572113037109375
y_re = -0.4283744483841627204157020969432778656482696533203125
y_im = -0.30221288569104542975907179425121285021305084228515625

x = cmplx(x_re, x_im, kind=8)
y = cmplx(y_re, y_im, kind=8)

z = x / y
! libgcc's result:
! z_re = 0.11961222766042166498580488591396697728441722574643790721893310546875E-6
! z_im = 0.17578777745980984636986477198661305010318756103515625E+1
! Smith's result
! z_re = 0.119612227629041476570197211071189258291269652545452117919921875E-6
! z_im = 0.17578777745980984636986477198661305010318756103515625E+1

kiranchandramohan:

We saw precision issues in an lapack based internal library. Looking back through the notes, the issue was seen in crotg (LAPACK: BLAS/SRC/crotg.f90 Source File). The specific instance was the following:
g = (1.1920929E-07,-0.5142264)
u =  8.5070592E+37
gs = g / u
We get different results:
llvm-project/flang:
gs = (0.,-0.)
classic flang:
(1.4012985E-45,-6.0447027E-39)
If lapack-based routines work fine without issues, it should be OK. However, I cannot say with confidence whether the proposed complex.div lowering using Smith’s algorithm fixes all issues.

In this example, using Smith’s algorithm produced the same result as classic flang. If the algebraic method is used, the result for this example would be (0.,-0).
Also, I tried building and testing LAPACK using Smith, and it passed without any issues.

This example cannot be solved even with the Smith’s algorithm. It may be necessary to implement other algorithms in MLIR in addition.

Considering these situations, would it be acceptable to change the Flang default to Smith’s algorithm?

Topic		Replies	Views
Clang ignoring --fast-math for complex division, serious performance hit Clang Frontend	17	257	November 10, 2017
Question and Proposal Regarding Complex Number Division Algorithm in the complex dialect MLIR	13	453	February 13, 2025
Performance regression on ARM LLVM Dev List Archives	23	153	October 21, 2014
RFC: Complex in LLVM LLVM Dev List Archives	30	505	July 9, 2019
FP Intrinsics LLVM Dev List Archives	6	132	March 17, 2005

Optimization of complex number division

LAPACK Tests

Precision Tests

Related topics