I’ve got a problem with the way that FPMathOperator determines which call instructions should be considered this type of operator. Specifically, there are cases where it doesn’t recognize calls to the standard math library complex calls as FPMathOperator calls, and so fast-math flags can’t be attached to these calls.
Consider the following code:
double complex foo(double complex x) {
double complex ex = cexp(x);
return ex + x;
}
float complex bar(float complex x) {
float complex ex = cexpf(x);
return ex + x;
}
The LLVM IR representation of this varies with ABI, so it’s not consistent across targets. On x86-64 Linux, it’s relatively simple. If I compile with “-O2 -ffast-math” I get this:
define dso_local { double, double } @foo(double noundef nofpclass(nan inf) %x.coerce0, double noundef nofpclass(nan inf) %x.coerce1) local_unnamed_addr #0 {
entry:
%call = tail call { double, double } @cexp(double noundef nofpclass(nan inf) %x.coerce0, double noundef nofpclass(nan inf) %x.coerce1) #3
%0 = extractvalue { double, double } %call, 0
%1 = extractvalue { double, double } %call, 1
%add.r = fadd fast double %0, %x.coerce0
%add.i = fadd fast double %1, %x.coerce1
%.fca.0.insert = insertvalue { double, double } poison, double %add.r, 0
%.fca.1.insert = insertvalue { double, double } %.fca.0.insert, double %add.i, 1
ret { double, double } %.fca.1.insert
}
define dso_local nofpclass(nan inf) <2 x float> @bar(<2 x float> noundef nofpclass(nan inf) %x.coerce) local_unnamed_addr #2 {
entry:
%call = tail call fast nofpclass(nan inf) <2 x float> @cexpf(<2 x float> noundef nofpclass(nan inf) %x.coerce) #3
%0 = fadd fast <2 x float> %call, %x.coerce
ret <2 x float> %0
}
Notice that in bar() the call to cexpf() is marked with fast-math flags, but in foo() the call to cexp() is not. (I’m particularly interested in the ‘afn’ flag here, BTW.) We recognize cexpf() as an FPMathOperator() because it returns a vector of floats. However, we do not recognize cexp() as an FPMathOperator because it returns a structure with two doubles.
Obviously, it would be easy enough to update FPMathOperator to also accept calls which return structures with two floating point types or vectors of such structures as fp math operators. The problem is that on x86-64 Windows double complex is returned via an sret argument as a pointer to a structure with two doubles, and (much worse) complex float is returned as an i64.
So, I’m looking for a good way to handle this. The possibilities I have thought of so far are:
- Introduce new intrinsics for standard math library complex calls and require front ends to use these when it wants the call to have fast-math flags
- Introduce new attributes that the front end can use to indicate when a parameter or return type is representing a complex type or some component thereof
- Make first class types for complex floating point
(Yes, I know the history behind the third option in that list, but it would solve this problem, so I put it there.)
Option 1 above is a bit of a problem, because the front end handles setting up the ABI for arguments and return types, so we’d either need variations of the intrinsic to handle all possible ABI representations, or we’d need to teach back ends to generate the ABI-compliant calls when lowering these intrinsics.
Option 2 is also not without difficulty because in some cases the real and imaginary components of a complex value are passed as separate arguments. That’s not a deal-break, but it would require parameter attributes like ‘complex(float)’, ‘complex_real(float)’, ‘complex_imaginary(float)’, and so on.
At this point, apart from making first class complex types (which I know a lot of people oppose), I’m not entirely happy with any of the options I’ve thought of.
Does anyone have any other ideas?