RFC: Constant folding math functions for long double

Hi,

Clang is currently unable to constant fold calls to math.h functions such as logl(), expl() etc.

The problem is that APFloat doesn’t have these functions, so Clang is forced to rely on the host math library. Because long double isn’t portable, we only ever query the host math library for double or float results.

I can see three methods for allowing constant folding for types that are larger than double, some more expensive than others:

  1. Introduce a dependency on libMPFR, as GCC does. The dependency could be hard or soft, with a fallback to the current behaviour if it doesn’t exist.
  2. Write the trancendental functions ourselves in APFloat (yuck!)
  3. If the long double format on the compiler host is the same as the target, use the host library.

(2) is the hardest. (3) is the easiest, but only works in a subset of cases and I really don’t like the idea of better output when compiling on one platform compared to another (with equivalent targets).

What do people think about (1)? Or is this completely out of the question?

Cheers,

James

  1. Introduce a dependency on libMPFR, as GCC does. The dependency could be
hard or soft, with a fallback to the current behaviour if it doesn't exist.

A soft dependency would be much better.

  3. If the long double format on the compiler host is the same as the
target, use the host library.

No hard feelings about this one...

(2) is the hardest. (3) is the easiest, but only works in a subset of cases
and I really don't like the idea of better output when compiling on one
platform compared to another (with equivalent targets).

If you make (1) a soft dependency, then (3) will have the same effect,
ie, sometimes the code will be better, sometimes it won't.

I personally don't feel bad about introducing soft dependencies or
special cases to make code better, as long as the users understand how
to make their code better if they wish (ie. documenting the
dependency). So I think both (1) and (3) are acceptable solutions,
when properly documented.

cheers,
--renato

Please not (1).

Cheers,
-Neil.

Hi Neil,

Please not (1).

Could you please elaborate on your concern a bit more?

Cheers,

James

Hey James,

I really fundamentally dislike libMPFR.

License of the codebase aside, would a dependency be on the built .so, or would it be that we’d want to pull in the code and build that?

My worry if we do bring it in even as a soft dependency is how is this figured out - is it a case that CMake will, if it finds the .so on the system, use and link against it? I worry that we are introducing another matrix of potential failures if the lib is present or not.

Cheers,
-Neil.

Hi Neil,

I admit that at this point I haven’t considered the implications of the license MPFR is under, and at the moment I’m sticking my head in the sand until and unless we want to go down this path.

My expectation is that we would use their exposed API - so we’d #include <mpfr.h> and use functions from there, linking against -lmpfr and -lgmp. I admit that this option would indeed add another dimension to the testing matrix.

Do you have an alternative solution or a preferred solution of those I enumerated earlier?

Cheers,

James

You usually have far more control over which libraries you install
than over libc and the ABI of your host system, so (1) has a strong
advantage there.

I like the sound of libMPFR or some equivalent.

Tim.

Hey James,

(2) is the best - although the most work (and I don’t envy who would have to do this!)

My problem with (3) is that you may get subtle precision differences between the constant-folded value calculated on host x86, to the library used on, for example, an ARM device. Maybe this isn’t enough of an issue for most peoples purposes though?

I’d argue that if the size of long double format on the compiler host is greater than or equal to the target is ok too though (if we accept a differing of precision, why not allow a platform that has 80bit long double produce more precise values?)

Cheers,
-Neil.

Hi Neil,

My problem with (3) is that you may get subtle precision differences between the constant-folded value calculated on host x86, to the library used on, for example, an ARM device. Maybe this isn’t enough of an issue for most peoples purposes though?

Sure. My suggestion would be that we would refuse to constant fold values when run on x86 targetting ARM for example, because ARM uses fp128 and x86 uses fp80. So what I suggest wouldn’t be a codegen fault, it was just optimize less in some cases.

I personally prefer (1), because not only is (2) a lot of work, it’s a lot of work that has a large scope for error.

James

My feeling is that we shouldn’t be relying on host long double routines. We’re already skating on thin ice by relying on host double and float routines. This is a great way to make the compilation result vary depending on the host, which is something we try to avoid.

An optional MPFR dependency would also be pretty painful. I expect it will frequently be missing and will not be exercised by most buildbots.

MPFR suffers from the same problem as host library routines; the results don’t (in general) match what you get at runtime.

  • Steve

Wouldn't this be responsibility of whomever care?

I mean, like any other back-end / plugin that people dump in ToT to
help them "have third-party code hanging out but is not actually
tested and should be fine as long as it doesn't break anything else".

We seems to be gathering more and more of those things lately, and I
thought people were generally ok with it.

Not that I care too much, I don't have any use for MPFR myself, just
wanting to understand where we cut the line.

cheers,
--renato

To be totally clear: this is true of all approaches to constant-folding functions that are not fully-specified. Even if host is the targeted system, implementations change, bugs get fixed, etc.

– Steve

IMO if constant folding of transcendental functions makes a significant
difference for your program, you likely are doing something strange
already. I don't think it matters much for a lot of use cases, so having
an optional dependency for this seems to be fine. Note that the
non-transcendental functions are quite a different deal, especially
reasonable well behaving functions like log and exp.

Joerg

Hi Joerg,

IMO if constant folding of transcendental functions makes a significant
difference for your program, you likely are doing something strange
already.

Alas it’s not as simple as that. Currently, if you declare:

std::uniform_real_distribution x;

LLVM emits two calls to logl() with constant arguments, a fdiv and a fptoui.

Libc++'s implementation is consumed and folded much more nicely by LLVM, but at the moment anyone comparing LLVM and GCC will think that GCC is around 40% better for some workloads.

James

That sounds like a library issue that qualifies as “somewhat strange”. Why does this require a log at all?

– Steve

Hi,

If you’re interested, include/bits/random.tcc:3312 (std::generate_canonical()). I wish I could just point people at libc++, but that’s outside of my control. As for fixing the library, that horse bolted some time ago.

Cheers,

James

My two cents:

  1. While I’m sympathetic to Steve’s point about this not always being desirable, the fact than many other compilers do this will ensure that code is written assuming it. I think we need to have support for this in order to have even remotely performance portable library code.

  2. I strongly dislike an optional dependency on MPFR. We shouldn’t have critical functionality tied to system libraries IMO. I understand that it is a lot of work, but I think we need to produce or get contributed a reasonable implementation of these routines under the LLVM license. As an added benefit, we might be able to (eventually) share the code for them with a runtime library that provides portable implementations for systems where hardware support is lacking.

-Chandler

Hi Chandler,

Your point of view makes sense. I’d support it strongly myself if I saw it happening… :frowning:

There are math libraries out there decently licensed that we could adapt - ARM has released our own math libraries for example. However adapting these to be arbitrary precision is certainly nontrivial and is something I’d be completely incapable of doing. My numerical analysis is not very hot and ULP means nothing to me :frowning:

James

From: "Chandler Carruth via llvm-dev" <llvm-dev@lists.llvm.org>
To: "James Molloy" <james@jamesmolloy.co.uk>, "Stephen Canon"
<scanon@apple.com>
Cc: "LLVM Dev" <llvm-dev@lists.llvm.org>
Sent: Monday, April 4, 2016 1:41:05 PM
Subject: Re: [llvm-dev] RFC: Constant folding math functions for long
double

My two cents:

1) While I'm sympathetic to Steve's point about this not always being
desirable, the fact than *many* other compilers do this will ensure
that code is written assuming it. I think we need to have support
for this in order to have even remotely performance portable library
code.

+1

2) I strongly dislike an optional dependency on MPFR. We shouldn't
have critical functionality tied to system libraries IMO. I
understand that it is a *lot* of work, but I think we need to
produce or get contributed a reasonable implementation of these
routines under the LLVM license. As an added benefit, we might be
able to (eventually) share the code for them with a runtime library
that provides portable implementations for systems where hardware
support is lacking.

+1

The prospect of having Clang compiled with MPFR available produce different code, even on otherwise identical base systems, than Clang compiled without it seems highly undesirable (especially since we might have only small numerical differences as observable effects). Plus, there are licensing considerations in environments where we need to statically link LLVM (in JITs and such) that I don't think we want to impose on downstream developers/distributors.

However, the good news is that developing our own APFloat versions of these functions is relatively easy -- not easy in an absolute sense, but: 1) Unlike the libc implementations, we're not worried about raw performance on native types, 2) we can use large floats with a lot of extra precision easily, and 3) we don't need to use algorithms designed to be efficient for really-large bit counts (like MPFR does).

In this regard, it might also be helpful to look at: https://github.com/JuliaLang/openlibm (which seems to be very-permissively licensed and has reference implementations for all of the relevant algorithms in C).

-Hal