[request] Improve debugging of ffast-math optimizations

Finding which code is responsible for violating -ffast-math assumptions is a very tedious task that I do not wish to anybody.

Right now when a fast-math assumption is violated the only way I have to debug it is to disable fast math, and step-by-step try each of the more fine grained math optimizations until I find the minimal combination that reproduces the issue. This typically allows me to guess which kinds of operations I should be looking for, and start bisecting. Needles to say this is extremely time consuming. There must be a better way.

I would like the behavior of all my programs to not change much under the influence of fast math. The only way to achieve this is to manually perform the transformations that fast math is going to perform, like exploiting associativity, and obviously, never violate any of fast-math assumptions.

It would be nice to have a warning that detects e.g. associativity transformations, and suggest how to rewrite them to minimize difference in the results between a program compiled with and without fast-math.

There is a tool called Herbie [0] that is used by the rust-herbie-lint [1] (worht seeing in action) to improve the accuracy and stability of mathematical operations so maybe something similar could be done to detect some of the transformations that fast-math does and suggest them to the user.

At run-time I would like some kind of fast-math sanitizer that catches all cases in which fast-math assumptions are violated (signaling nans, signed zeros, …). Since the undefined-behavior sanitzer already covers e.g. division by zero, maybe a fast-math check would belong there as well.

[0] http://herbie.uwplse.org/
[1] https://github.com/mcarton/rust-herbie-lint

Are these “assumptions” documented anywhere?

Thanks,

Typically this 1 by 1 process of elimination is automated and also
ideally done on a reduced test case. Maybe someone around here can
share their scripts.. (they must exist... Hal?) If you're starting
with stable code it's much easier - if you're doing active development
on the codebase as well - I'm empathetic..

From: "C Bergström via cfe-dev" <cfe-dev@lists.llvm.org>
To: "Mehdi Amini" <mehdi.amini@apple.com>
Cc: "Gonzalo BG" <gonzalobg88@gmail.com>, "clang developer list" <cfe-dev@lists.llvm.org>
Sent: Saturday, April 23, 2016 7:58:00 PM
Subject: Re: [cfe-dev] [request] Improve debugging of ffast-math optimizations

Typically this 1 by 1 process of elimination is automated and also
ideally done on a reduced test case. Maybe someone around here can
share their scripts.. (they must exist... Hal?)

With a run script that sets its exit status using a tolerance-based comparison, bugpoint will do a reasonable job. It's more or less like debugging other kinds of miscompiles.

-Hal

Hi!

First off, the tool you reference below, rust-herbie-lint, looks really neat:

test.rs:40:5: 40:18 warning: Numerically unstable expression, #[warn(herbie)] on by default
test.rs:40 (a/b + c) * b;
               ^~~~~~~~~~~~~
test.rs:40:5: 40:18 help: Try this
test.rs: (c * b) + a;
test.rs:67:5: 67:23 warning: Numerically unstable expression, #[warn(herbie)] on by default
test.rs:67 (a*a + b*b).sqrt();
               ^~~~~~~~~~~~~~~~~~
test.rs:67:5: 67:23 help: Try this
test.rs: a.hypot(b);

That's awesome! I hope that Clang can develop warnings like this.

Second, we could have a sanitizer mode in which instructions with nnan and ninf flags (and similar) were checked (operands and output) for these special values. Such a sanitizer failing would not always be a concrete problem (sometimes users expect these values might be generated, but they just don't care how they're handled), but many users want a way to determine when and where such values appear. That having been said, the most efficient way to do this (on many architectures) is to enable certain floating-point exceptions. Instrumentation would be much slower. Maybe a sanitizer for these things could take advantage of hardware floating-point exception support somehow.

For optimizations, like traditional reassociation (and reduction vectorization), we could use the backend "remark" feature to give users information on where these apply. On large codes, however, such information would be very noisy. For this to be useful we'd need to think about how to filter, I suspect we'd need to combine it with profiling information, so they we only generated the information on actually-executed (or even hot) paths.

Explaining what the optimization did in terms of source-level variables is something I'm not quite such how to best accomplish in the current infrastructure (perhaps using debug info?). That would be a nice functionality to have, however.

-Hal

AFAIK optimizations are, in general, not documented in detail within
clang's documentation. The LLVM documentation [0] explicitly mentions all
the assumptions that these optimizations exploit:

[0] http://llvm.org/docs/LangRef.html#fast-math-flags

Indeed but I think this is a different issue. We have this problem with -Rpass-analysis when inspecting e.g. vectorization information.

I think that using profiling information to improve this is a very novel idea, but for me it would be enough to just have pragmas to turn this information on/off on specific code sections, TUs, … at a fine grained level. Anyhow I think this is something that should be solved in a consistent way for all diagnostics.

Hi!

<snip>

For optimizations, like traditional reassociation (and reduction
vectorization), we could use the backend "remark" feature to give users
information on where these apply. On large codes, however, such information
would be very noisy. For this to be useful we'd need to think about how to
filter, I suspect we'd need to combine it with profiling information, so
they we only generated the information on actually-executed (or even hot)
paths.

Would it be much more noisy than reports on missed auto-vectorization
opportunities? When I used those, I always piped the output to a file and
browsed it in an editor to find the spots I knew are hot from separate
profiler runs. Not perfect, but certainly doable and helpful and I would
assume much easier to implement on the clang side.

Cheers

From: "Milian Wolff" <mail@milianw.de>
To: cfe-dev@lists.llvm.org, "Hal Finkel" <hfinkel@anl.gov>
Cc: "Gonzalo BG" <gonzalobg88@gmail.com>
Sent: Monday, April 25, 2016 4:49:42 AM
Subject: Re: [cfe-dev] [request] Improve debugging of ffast-math optimizations

> Hi!

<snip>

> For optimizations, like traditional reassociation (and reduction
> vectorization), we could use the backend "remark" feature to give
> users
> information on where these apply. On large codes, however, such
> information
> would be very noisy. For this to be useful we'd need to think about
> how to
> filter, I suspect we'd need to combine it with profiling
> information, so
> they we only generated the information on actually-executed (or
> even hot)
> paths.

Would it be much more noisy than reports on missed auto-vectorization
opportunities?

I suspect it would be significantly noisier, because it would be per subexpression, not per loop. We need to think about how to make the information useful. Getting 10s or 100s of remarks that say 'a floating-point operation was reassociated' for every non-trivial line of code with math seems likely useless. As a minimum, we need to figure out how to pull source-level variable names. I think that, realistically, to be useful, the information would need to be presented as you originally implied: we need something that 'decompiles' the IR into source-level expressions so that the user can see what the compiler did with the code. This might be doable if we keep enough debug info around while optimizing, but I'm not sure.

-Hal