Thanks for the information! While it differs from that infomation, I also investigated the calculation accuracy and potential for overflow of the Smith algorithm.
LAPACK Tests
I looked at the complex division test in LAPACK (test_zcomplexdiv.f). I believe this test is related to the possibility of overflow and denormalized numbers. It was only a double-precision test, but I investigated it similarly for half, single, and quadruple precision. As a result of the investigation, the cases where overflow occurred were the same for the default runtime function and Smith’s algorithm using MLIR. Overflow occurred in the following four cases.
[ c3] X = 1.7976931348623157E+308 : (x+x*I)/(x+x*I) = NaN+0.0000000000000000E+000*I differs from +1.0000000000000000E+000
[ c4] X = 8.9884656743115795E+307 : (x+x*I)/(x+x*I) = NaN+0.0000000000000000E+000*I differs from +1.0000000000000000E+000
[ f3] X = 1.7976931348623157E+308 : (x+x*I)/(x-x*I) = 0.0000000000000000E+000 NaN*I differs from +0.0000000000000000E+000+1.0000000000000000E+000*I
[ f4] X = 8.9884656743115795E+307 : (x+x*I)/(x-x*I) = 0.0000000000000000E+000 NaN*I differs from +0.0000000000000000E+000+1.0000000000000000E+000*I
When using the Algebraic method, overflow occurs in even more cases. Specifically, if X is greater than 1.34E+154 (squaring it exceeds the maximum value for double-precision), overflow will occur during intermediate calculations. The Smith’s algorithm can avoid this type of overflow.
Precision Tests
For the calculation of z = x / y = (a + bi) / (c + di), I randomly set a, b, c, and d to values between -1.0 and 1.0, and compared the calculation results of each algorithm. After comparing 100,000 cases, the following results were obtained compared to runtime function of libgcc (the current Flang default):
- Single Precision
| Algorithm | Number of Cases with Precision Errors |
|---|---|
| compiler-rt | 81,760 |
| Smith | 41,029 |
| Algebraic | 81,760 |
The Smith’s algorithm is closer to the calculation results of libgcc than the algebraic method, but there are still differences. The largest precision error occurred in the following example:
x_re = 0.687424004077911376953125
x_im = -0.6453526020050048828125
y_re = -0.81454432010650634765625
y_im = -0.86763536930084228515625
x = cmplx(x_re, x_im, kind=4)
y = cmplx(y_re, y_im, kind=4)
z = x / y
! libgcc's result:
! z_re = -0.465173843622324056923389434814453125E-4
! z_im = 0.792291581630706787109375E+0
! Smith's result:
! z_re = -0.463739297629217617213726043701171875E-4
! z_im = 0.792291581630706787109375E+0
In this example, the lower 15 bits of the mantissa of z_re differ. However, in most cases, the difference occurs in the lower 1 or 2 bits.
- Double Precision
| Algorithm | Number of Cases with Precision Errors |
|---|---|
| compiler-rt | 81,605 |
| Smith | 40,598 |
| Algebraic | 81,605 |
The largest precision error occurs in the following case. The lower 24 bits of the mantissa of z_re differ.
x_re = 0.5312533058489978810001730380463413894176483154296875
x_im = -0.75302995788782245423220729207969270646572113037109375
y_re = -0.4283744483841627204157020969432778656482696533203125
y_im = -0.30221288569104542975907179425121285021305084228515625
x = cmplx(x_re, x_im, kind=8)
y = cmplx(y_re, y_im, kind=8)
z = x / y
! libgcc's result:
! z_re = 0.11961222766042166498580488591396697728441722574643790721893310546875E-6
! z_im = 0.17578777745980984636986477198661305010318756103515625E+1
! Smith's result
! z_re = 0.119612227629041476570197211071189258291269652545452117919921875E-6
! z_im = 0.17578777745980984636986477198661305010318756103515625E+1
In this example, using Smith’s algorithm produced the same result as classic flang. If the algebraic method is used, the result for this example would be (0.,-0).
Also, I tried building and testing LAPACK using Smith, and it passed without any issues.
This example cannot be solved even with the Smith’s algorithm. It may be necessary to implement other algorithms in MLIR in addition.
Considering these situations, would it be acceptable to change the Flang default to Smith’s algorithm?