CLang and ISO C math functions

When I updated our out of tree compiler to v3.9 RC3 from v3.8, I noticed a number of large performance regressions, some tests using 5 times as many instructions. However, when I examined the test, I realised that something quite different was going on concerning the ISO C math functions, and it is not a true performance regression at all.

I will use the following example for this message, and this is compiled with ‘-S -O3’. The option ‘-ffast-math’ is not used, and I have verified that ‘-fmath-errno’ is present in the ‘-cc1’ options:

extern double exp(double);

extern double foo(double);

int useMathName() {

if ((exp(1.0) < 2.71) || (exp(1.0) > 2.72))

return -1;

return 0;

}

int useOtherName() {

if ((foo(1.0) < 2.71) || (foo(1.0) > 2.72))

return -1;

return 0;

}

With v3.8, the implementation of the function ‘useMathName’ was reduced to simply ‘return 0’. The compiler elided the calls to ‘exp’, assumed the value that would have been returned if it was called, decided that the two tests would be ‘false’ and reduced the code-generation to ‘return 0’. This does not happen for the other function ‘useOtherName’, and the code generated is as expected.

After updating to v3.9 RC3, the compiler is no longer eliding the calls to ‘exp’ - probably because a bug was fixed since ‘errno’ could be changed - but it is still presuming the returned value and eliding the tests, so the function is now 2 consecutive calls to ‘exp’ and a ‘return 0’.

I have verified that this is the case for the unaltered X86 v3.8 distribution version too.

I would expect this behaviour if ‘-ffast-math -fno-math-errno’ was selected, but it isn’t, and I think that this is an invalid optimisation. It also means that some of my math functional tests are not reporting honestly (this only happens when the argument(s) are constants). Also, on our architecture, ‘double’ is FP32, and it is probable that the compiler is using the host platform’s implementation which is FP64 for evaluating the test expressions, and this will introduce precision differences that the test will not detect - in my real tests, the test expression ranges are more fine-grained to allow for legitimate FP32 ranges.

Thanks,

MartinO

When I updated our out of tree compiler to v3.9 RC3 from v3.8, I noticed a number of large performance regressions, some tests using 5 times as many instructions. However, when I examined the test, I realised that something quite different was going on concerning the ISO C math functions, and it is not a true performance regression at all.

I will use the following example for this message, and this is compiled with ‘-S -O3’. The option ‘-ffast-math’ is not used, and I have verified that ‘-fmath-errno’ is present in the ‘-cc1’ options:

extern double exp(double);

extern double foo(double);

int useMathName() {

if ((exp(1.0) < 2.71) || (exp(1.0) > 2.72))

return -1;

return 0;

}

int useOtherName() {

if ((foo(1.0) < 2.71) || (foo(1.0) > 2.72))

return -1;

return 0;

}

With v3.8, the implementation of the function ‘useMathName’ was reduced to simply ‘return 0’. The compiler elided the calls to ‘exp’, assumed the value that would have been returned if it was called, decided that the two tests would be ‘false’ and reduced the code-generation to ‘return 0’. This does not happen for the other function ‘useOtherName’, and the code generated is as expected.

After updating to v3.9 RC3, the compiler is no longer eliding the calls to ‘exp’ - probably because a bug was fixed since ‘errno’ could be changed - but it is still presuming the returned value and eliding the tests, so the function is now 2 consecutive calls to ‘exp’ and a ‘return 0’.

I have verified that this is the case for the unaltered X86 v3.8 distribution version too.

I would expect this behaviour if ‘-ffast-math -fno-math-errno’ was selected, but it isn’t, and I think that this is an invalid optimisation. It also means that some of my math functional tests are not reporting honestly (this only happens when the argument(s) are constants). Also, on our architecture, ‘double’ is FP32,

Thanks Hal,

I would expect this behaviour if ‘-ffast-math -fno-math-errno’ was selected, but it isn’t, and I think that this is an invalid optimisation. It also means that some of my math functional tests are not reporting honestly (this only happens when the argument(s) are constants). Also, on our architecture, ‘double’ is FP32,

Does Clang for your target emit C-language “double” types as “double” at the IR level? If so, that’s wrong. “double” at the IR level is assumed to be an IEEE double-precision number. All of the constant folding will do the wrong thing on your target if this is what is happening.

No, I have:

DoubleFormat = &llvm::APFloat::IEEEsingle;

set in my ‘TargetInfo’, and the IR shows ‘f32’. But it is the elision of the tests when I am not using ‘-ffast-math’ that I think is wrong, the tests are not present; I’m quite happy with this behaviour when ‘-ffast-math’ is used. In the ‘foo’ example the calls to ‘foo’ are retained and the tests are present; it is only when I rename ‘foo’ to ‘exp’ or some other math function that this happens.

With ‘-fno-math-errno’ it can be assumed that the math functions have no other side-effects, and in combination with ‘-ffast-math’ the reduction to the optimal ‘return 0’ is perfect. But neither of these options are selected.

I downloaded the official v3.8.0 distribution for X86 from the LLVM website and tried that, and got the same behaviour, it’s not particular to my out-of-tree changes. With v3.9.0 RC3 the calls to the math functions are no longer elided, but the tests still are.

and it is probable that the compiler is using the host platform’s implementation which is FP64 for evaluating the test expressions,

Yes, that’s right. See ConstantFoldScalarCall in LLVM’s lib/Analysis/ConstantFolding.cpp. We’re obviously aware this can cause issues when cross compiling. If you’d like to discuss this behavior, you should do so on llvm-dev. We might want to make this more configurable than it currently is.

I don’t have any issue with the constant-folding, just the elision of the tests. I assume that the special handling of the C math functions is happening in CLang rather than LLVM, but I don’t generally look much at the semantic analysis code so I am not as familiar with it. If the math library semantic issues are LLVM then I should post this on LLVM-Dev, but I think that this is more likely front-end issue - no?

MartinO

Hi Martin,

I’m having a little difficulty understanding this:

I would expect this behaviour if ‘-ffast-math -fno-math-errno’ was selected, but it isn’t, and I think that this is an invalid optimisation.

There are two optimizations going on:

  1. Constant folding of exp() results.
  2. Elision of exp() calls.

I would expect (1) to be valid even in full IEEE compliance mode. I would expect (2) to be valid with -fno-math-errno - that is, assume that errno doesn’t exist. This is implied by -ffast-math.

(1) is valid in all modes because of the “as-if” rule - the return value is as-if the function were called. The function has well defined behaviour, so we don’t actually need to call it to get the result. (2) would also be valid in all modes as far as I understand it, because on successful return exp() does not change errno. exp(1.0) is well-defined and succeeds, so errno doesn’t change. It seems to me that clang has regressed in performance here in 3.9 unless there’s a subtlety that I’m missing (probable).

It sounds to me like you’re attempting to test your math library. If you’re doing this, you probably want -fno-builtin which informs the compiler not to assume the library calls are well defined. With this flag, no constant folding (or elision) will be done which I presume is what you want.

Cheers,

James

Thanks Hal,

Thanks James and Hal,

I hadn’t thought of ‘-fno-builtin’ (blush), and as you say the “as if” rule is fine. With ‘-fno-builtin’ the behaviour is as I had expected. And you are right, for testing the math functions in the library I need to use this.

Regarding James’ points below:

Bullet 2, in v3.8.0 the call was being elided when ‘-fno-math-errno’ was not specified, and this appears to have been changed in v3.9.0 (which I see just got tagged J ) which does not elide the call.

It was this change that drew my attention to the problem. Sorry for putting this on CFE-Dev, I incorrectly thought that this was a Front-End optimisation issue. If I have further thoughts on this I will start a new thread on LLVM-Dev.

All the best,

MartinO