Combining fast math flags with constrained intrinsics

Hi all,

A question came up in a code review (https://reviews.llvm.org/D72820) about whether or not to allow fast-math flags to be applied to constrained floating point intrinsics (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This has come up several times before, but I don’t think we’ve ever made a decision about it.

By default, the optimizer assumes that floating point operations have no side effects and use the default rounding mode (round to nearest, ties to even). The constrained intrinsics are meant to prevent the optimizer from making these assumptions when the user wants to access the floating point environment – to change the rounding mode, to check floating point status bits, or to unmask floating point exceptions. The intrinsics have an argument that either specify a rounding mode that may be assumed or specify that the rounding mode is unknown (this argument is omitted if it doesn’t apply to the operation) and an argument to specify whether the user wants precise exception semantics to be preserved, wants to prevent syntactically spurious exceptions from being raised, or doesn’t care about floating point exceptions.

Because the constrained mode can be localized to a sub-region within a function, we also need to support the case where a constrained intrinsic is used but the default behavior (default rounding mode, exceptions ignored) is used. For this reason, I think our IR definition must allow fast math flags to be applied to constrained intrinsics. That makes this primarily a question about what combinations should be permitted by front ends and how constructs like pragmas should affect the various states. For example, I might have source code like this:

Hi all,

A question came up in a code review (https://reviews.llvm.org/D72820) about whether or not to allow fast-math flags to be applied to constrained floating point intrinsics (http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). This has come up several times before, but I don’t think we’ve ever made a decision about it.

By default, the optimizer assumes that floating point operations have no side effects and use the default rounding mode (round to nearest, ties to even). The constrained intrinsics are meant to prevent the optimizer from making these assumptions when the user wants to access the floating point environment -- to change the rounding mode, to check floating point status bits, or to unmask floating point exceptions. The intrinsics have an argument that either specify a rounding mode that may be assumed or specify that the rounding mode is unknown (this argument is omitted if it doesn’t apply to the operation) and an argument to specify whether the user wants precise exception semantics to be preserved, wants to prevent syntactically spurious exceptions from being raised, or doesn’t care about floating point exceptions.

Because the constrained mode can be localized to a sub-region within a function, we also need to support the case where a constrained intrinsic is used but the default behavior (default rounding mode, exceptions ignored) is used. For this reason, I think our IR definition must allow fast math flags to be applied to constrained intrinsics. That makes this primarily a question about what combinations should be permitted by front ends and how constructs like pragmas should affect the various states. For example, I might have source code like this:

Andy, thanks for writing this up. A few thoughts:

1. The mental model that I have is that there is always an FP_CONTRACT pragma: there's some default (implicit) pragma at the beginning, and what it says (off/on/fast) is controlled by the command-line flags (or the driver's default if no flags are explicitly provided). Thus, unless there's some reason my model doesn't really work, I lead against differentiating between the there-is-a-pragma and there-is-not-a-pragma cases in some fundamental way.

2. I'm inclined to go with your choice (b) above because I think that we should treat these concepts as orthogonal

Agreed.

(to the extent that is reasonable: by design, we don't want to reassociate constrained operations, so that flag just might have on effect on those intrinsics). This lets the later optimization passes decide how to treat the various combinations of flags and intrinsics (just as with all other intrinsics that might be present).

I think I agree, but this needs clarification. My view is that we
don't want to reassociate constrained operations when
`-fp-model=strict`. When `-fp-model=fast`, we should reassociate and
do pretty much all the reasonably safe FMF transformations, with the
caveat that I don't think NNAN and NINF make sense for any trap-safe
mode. We may want to trap on those NaNs and Infs we'd optimize away.

One of the viewpoints on the constrained intrinsics is that it is a way to represent floating point environment. In this case they are just variants of corresponding IR nodes and in theory we could use the constrained intrinsics everywhere instead of the regular nodes. From this viewpoint it make sense to keep symmetry between constrained intrinsics and corresponding regular IR nodes.

So for the first question the variant a (generate a constrained version of the llvm.fmuladd instrinsic) looks preferable. Additional flags that specify the way compiler should treat this node (like fast math flags) should be applied to constrained intrinsics in the same extent as to the non-constrained counterparts. Particular combination of a node and additional flags may be treated differently for constrained intrinsics depending on its semantics.

This symmetry could help us in implementing full-fledged support of the constrained intrinsics in transformation - they would share the same code path with corresponding non-constrained nodes.

> One of the viewpoints on the constrained intrinsics is that it is a
> way to represent floating point environment. In this case they are
> just variants of corresponding IR nodes and in theory we could use
> the constrained intrinsics everywhere instead of the regular nodes.
> From this viewpoint it make sense to keep symmetry between
> constrained intrinsics and corresponding regular IR nodes.

Agreed with this.
``
> So for the first question the variant a (generate a constrained
> version of the llvm.fmuladd instrinsic) looks preferable.

But not with this. Note that in Andrew's example we are operating
under -ffp-contract=fast, in which case clang never emits fmuladd,
so it shouldn't in the constrained case either. Instead, it will
emit fmul/fadd nodes with the contract FMF set, so in constrained
mode it should emit constrained fmul/fadd with the contract FMF set.
(This was Andrew's variant (b).)

fmuladd is only emitted in the -ffp-contract=on case, which is
intended to allow contractions only within a single source statement.
Since the LLVM back-end no longer knows the boundaries of source
statemtents, this requires help from clang; this is why clang will
emit fmuladd in those cases where the mul and add originate from
within the same source statement.

To fully map all these cases onto constrained intrinsics, we need
both to allow contract (and other) FMFs on constrained intrinsics
*and* allow a constrained fmuladd.

Bye,
Ulrich

with the caveat that I don't think NNAN and NINF make sense for any trap-safe mode

I said this in my original message, but I'd like to reiterate it here. I think it does make sense to combine these with the trap-safe modes. It's an optimization. It's not saying we don't care about NaN and inf. It's asserting that they will not occur. This definitely gives the user the ability to shoot themself in the foot, but really no more so than these flags do in the non-constrained case. If the user is certain that their data and algorithms will never result in NaNs or infinities, we can optimize the code slightly better even in trap-safe modes than we could if we had to allow for the possibility of NaNs and infinities. Obviously, this can lead to missed exceptions and even incorrect results, but it can lead to incorrect results in the non-constrained case too. It's a risky option.

Overall, the picture I'm getting is that we need to have some sort of table of FP semantic modes and document which ones we consider orthogonal, which ones we consider mutually exclusive, and what effects various pragmas will have on each of them. Melanie worked through a lot of these issues in her patch to add the -fp-model and related command line options. Maybe we can generalize that to our reasoning about the IR and put it in the language ref.

-Andy

with the caveat that I don't think NNAN and NINF make sense for any trap-safe mode

I said this in my original message, but I'd like to reiterate it here. I think it does make sense to combine these with the trap-safe modes. It's an optimization. It's not saying we don't care about NaN and inf. It's asserting that they will not occur. This definitely gives the user the ability to shoot themself in the foot, but really no more so than these flags do in the non-constrained case. If the user is certain that their data and algorithms will never result in NaNs or infinities, we can optimize the code slightly better even in trap-safe modes than we could if we had to allow for the possibility of NaNs and infinities. Obviously, this can lead to missed exceptions and even incorrect results, but it can lead to incorrect results in the non-constrained case too. It's a risky option.

+1

We should consider these orthogonal concepts at the LLVM level, because
they are logically orthogonal. I recommend that Clang also consider them
to be orthogonal. If some frontend wants to consider them mutually
exclusive in the context of a particular language, or warn about them in
some way, that's a choice that frontend certainly has -- users aren't
generally writing LLVM directly.

-Hal

As a user (I did a load of programming ages ago calculating distances
from lats and longs), I heartily second having it documented!

I've known for ages that (some) computers can handle infinity and stuff,
and if I'd known that the compiler/language got it right it would have
made my life much simpler. As it was I had to special-case anywhere two
lats or longs were equal, whereas the maths just said "a divide by zero
will promptly be followed by a divide by infinity so it all cancels out".

Being able to rely on the compiler would have been so much nicer.

Cheers,
Wol

That's fair.