Complex arithmetic ignores -ffast-math after clang r219557, serious performance regressions


SVN now seems to be respecting the -ffast-math flag in the way we desire without Matthijs’ temporary fix. I didn’t see any further traffic about this on the cfe-dev list - was there a discussion elsewhere? Did it get fixed by accident as part of some other change, and we should worry about whether it will come up again?


Not that I’m aware of. Did anything happen here?

Hi Richard,

I think you're seeing a change in the backend (in combination with an older frontend change) -- perhaps the backend change is when we added fast-math flags to fcmp?

Here's what happens:

Given some code like this:
$ cat /tmp/c.c
typedef _Complex double dc;

dc foo(dc a, dc b) {
  return a*b;

Compiling it produces IR that looks like this:

define { double, double } @foo(double %a.coerce0, double %a.coerce1, double %b.coerce0, double %b.coerce1) #0 {
[perform the fast code]
  %isnan_cmp = fcmp fast uno double %mul_r, %mul_r
  br i1 %isnan_cmp, label %complex_mul_imag_nan, label %complex_mul_cont, !prof !1

complex_mul_imag_nan: ; preds = %entry
  %isnan_cmp1 = fcmp fast uno double %mul_i, %mul_i
  br i1 %isnan_cmp1, label %complex_mul_libcall, label %complex_mul_cont, !prof !1

complex_mul_libcall: ; preds = %complex_mul_imag_nan
  %call = call { double, double } @__muldc3(double %a.real, double %a.imag, double %b.real, double %b.imag) #1
  %4 = extractvalue { double, double } %call, 0
  %5 = extractvalue { double, double } %call, 1
  br label %complex_mul_cont

complex_mul_cont: ; preds = %complex_mul_libcall, %complex_mul_imag_nan,

So we always do the fast calculation, and then only if we get NaN, do we go back and do the slow calculation. But because of the fast-math flags, the backend can constant fold the relevant comparisons, and eliminate that entire set of branches, leaving on the fast code.