^^^^^^
Short answer? No one has submitted a patch to do it, and no one seems to care about it enough to have done so.
Long Answer: A couple of my coworkers ARE working on it, but it isn’t a high priority at the moment. I suspect it’ll happen somewhat soon.
Because no-one has implemented it. Patches would be welcome, but will need to start with a design and implementation of the requisite llvm extensions.
Yes. This is what Andrew Kaylor has been working on (cc’d). -Hal
There are still a few things missing from the optimizer to get it completely robust, but I think there is enough in place for front end work to begin. As I think I’ve demonstrated in my recent attempt to contribute a clang patch I’m not skilled enough with the front end to be the person to pull this off without an excessive amount of oversight, but as Erich indicated we do have some good front end people here who have this on their TODO list. It’s just not at the top of the TODO list yet.
If anyone is interested in the details of the LLVM side of things, there are constrained FP intrinisics (still marked as experimental at this point) documented in the language reference. The initial patch can be seen here:
https://reviews.llvm.org/D27028
I’ve since added another group of intrinsics to handle the libm-equivalent intrinsics, and more recently Wei Ding contributed an fma intrinsic.
The idea is that the front end will emit the constrained intrinsics in place of equivalent general FP operations or intrinsics in scopes where FENV_ACCESS is enabled. This will prevent the optimizer from making optimizations that assume default fenv settings (which is what we want the optimizer to do in all other cases). Eventually, we’ll want to go back and teach specific optimizations to understand the intrinsics so that where possible optimizations can be performed in a manner consistent with dynamic rounding modes and strict exception handling.
-Andy
There are still a few things missing from the optimizer to get it
completely robust, but I think there is enough in place for front end work
to begin. As I think I’ve demonstrated in my recent attempt to contribute
a clang patch I’m not skilled enough with the front end to be the person to
pull this off without an excessive amount of oversight, but as Erich
indicated we do have some good front end people here who have this on their
TODO list. It’s just not at the top of the TODO list yet.If anyone is interested in the details of the LLVM side of things, there
are constrained FP intrinisics (still marked as experimental at this point)
documented in the language reference. The initial patch can be seen here:https://reviews.llvm.org/D27028
I’ve since added another group of intrinsics to handle the libm-equivalent
intrinsics, and more recently Wei Ding contributed an fma intrinsic.The idea is that the front end will emit the constrained intrinsics in
place of equivalent general FP operations or intrinsics in scopes where
FENV_ACCESS is enabled. This will prevent the optimizer from making
optimizations that assume default fenv settings (which is what we want the
optimizer to do in all other cases). Eventually, we’ll want to go back and
teach specific optimizations to understand the intrinsics so that where
possible optimizations can be performed in a manner consistent with dynamic
rounding modes and strict exception handling.
How do you deal with the hoisting-into-fenv_access problem? Eg:
double f(double a, double b, double c) {
{
#pragma STDC FENV_ACCESS ON
feenableexcept(FE_OVERFLOW);
double d = a * b;
fedisableexcept(FE_OVERFLOW);
}
return c * d;
}
What stops llvm from hoisting the second fmul up to before the
fedisableexcept?
I believe that we will rely on fedisableexcept() being marked as having unmodeled side-effects to prevent a hoist like that.
I believe that we will rely on fedisableexcept() being marked as having
unmodeled side-effects to prevent a hoist like that.
fadd can be hoisted past *anything*, can't it?
If that’s the case, we may need to use the constrained intrinsics for all FP operations when FENV_ACCESS is enabled anywhere in a function.
Probably not over something that changes the rounding mode.
-Krzysztof
I think that’s also not enough; you’d get the same problem after inlining, and across modules with LTO. You would need to also prevent any interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.
And even that doesn’t seem to be enough. Suppose that some scalar optimization pass finds a clever way to converts some integer operation into a floating-point operation, such that it can prove that the FP values never overflow (I believe Chandler has an example of this that comes up in some real crypto code). Now suppose there’s a case where the integer operands are undef, but that the code in question is bypassed in that case. If the FP operations get hoisted, and you happen to have FP exceptions enabled, you have a potential miscompile.
Fundamentally, it seems to me that feenableexcept is unsound in the current LLVM IR model of floating point, if we assume that fadd, fmul, fsub etc do not have side-effects.
Yes, that’s correct. -Hal
Or we prevent inlining. Good point. However, that’s not a new problem, and we currently deal with this by respecting the noimplicitfloat attribute (and I think we’d definitely need to use that attribute if we allow fooling with the FP environment). -Hal
I agree with this: we can’t mix them. If a function allows FP environment access anywhere, then we’ll need to use the intrinsics everywhere. We’ll also need to use noimplicitfloat. We’ll need to disallow inlining as well where there’s a mismatch (we could, for example, add an “fp_env_access” attribute, and disallow inlining when there’s a caller/callee attribute mismatch). -Hal
I had considered the inlining issue, and updating the inliner to handle this was on my list of things yet to be done. I’m not sure I understand what you’re saying about LTO.
There are some points about the standard specification that seem a bit unclear to me, specifically with regard to how things work if you call an FENV_ACCESS-off function from within an FENV_ACCESS-on scope. I believe that when I talked with our front end guys here about that our conclusion was that doing that sort of thing is undefined behavior. I’m not sure if this is related to your LTO concern or not.
In any event, you definitely raise some good questions that I don’t have answers for. I’ll have to give this some more thought.
I just want to add that one of the primary design goals in what I’ve done so far was to not doing anything that would inhibit optimizations or require extra work on the part of the optimization passes in the case where we don’t care about FENV_ACCESS. That requirement makes attaching side-effects to the FP operations quite difficult.
-Andy
FWIW, I have a pass in a sandbox somewhere (never committed and probably stale by now) that converts all FP operations in a module to use the constrained intrinsics. I wrote it for testing purposes, but maybe it could be refactored to have some general utility use.
I had considered the inlining issue, and updating the inliner to handle
this was on my list of things yet to be done. I’m not sure I understand
what you’re saying about LTO.
The point about LTO was that you can't solve the inlining problem by using
intrinsics throughout an entire module if FENV_ACCESS is used anywhere,
because you might see code from outside the module.
There are some points about the standard specification that seem a bit
unclear to me, specifically with regard to how things work if you call an
FENV_ACCESS-off function from within an FENV_ACCESS-on scope. I believe
that when I talked with our front end guys here about that our conclusion
was that doing that sort of thing is undefined behavior. I’m not sure if
this is related to your LTO concern or not.
I don't think that observation helps much, since you can call
FENV_ACCESS-on code from FENV_ACCESS-off code, which is equally
problematic. It's also not true that the above case results in UB: you only
get UB if you're in a non-default FP mode when you enter the
FENV_ACCESS-off code, or if that code attempts to inspect or alter the FP
environment -- so the hoisting problem still seems to exist.
In any event, you definitely raise some good questions that I don’t have
answers for. I’ll have to give this some more thought.
I just want to add that one of the primary design goals in what I’ve done
so far was to not doing anything that would inhibit optimizations or
require extra work on the part of the optimization passes in the case where
we don’t care about FENV_ACCESS. That requirement makes attaching
side-effects to the FP operations quite difficult.
Indeed; FWIW that approach seems like the right one to me. We need a very
strong barrier between anything that might use feenableexcept and anything
that might use LLVM's "pure" FP operations, and I don't believe we have
such a thing in LLVM IR yet. So I don't think we're ready for frontend work
on FENV_ACCESS, since we don't yet know how it will be represented in IR.
I think that's also not enough; you'd get the same problem after inlining,
and across modules with LTO. You would need to also prevent any
interprocedural code motion across a FENV_ACCESS / non-FENV_ACCESS boundary.Or we prevent inlining.
Sure, I was considering that to be a form of interprocedural code motion
And even that doesn't seem to be enough. Suppose that some scalar
optimization pass finds a clever way to converts some integer operation
into a floating-point operation, such that it can prove that the FP values
never overflow (I believe Chandler has an example of this that comes up in
some real crypto code). Now suppose there's a case where the integer
operands are undef, but that the code in question is bypassed in that case.
If the FP operations get hoisted, and you happen to have FP exceptions
enabled, you have a potential miscompile.Good point. However, that's not a new problem, and we currently deal with
this by respecting the noimplicitfloat attribute (and I think we'd
definitely need to use that attribute if we allow fooling with the FP
environment).
OK, so the idea would be that we'd lower a function containing FENV_ACCESS
(or possibly an outlined block of such a function) with intrinsics for all
FP operations, specially-annotated libm function calls, and noimplicitfloat
and strictfp attributes to prevent generation of new FP operations and
inlining into non-strictfp functions. Right? (And we could imagine a
verifier check that ensures that you don't have pure FP operations inside a
strictfp function.)
Given the function annotations, do we need special intrinsics at all, or
could we instead require that passes check whether the enclosing function
is marked strictfp before optimizing, in the same way that some
optimizations must be gated by a check for noimplicitfloat?
To be clear, we’ve had several extensive discussions about this, on and off list, and Andy has started adding the corresponding intrinsics into the IR. There was a presumption about a lack of mixing, however, and we do need to work out how to prevent mixing the native IR operations with the intrinsics (although, perhaps we just did that). -Hal
Yes, exactly. That’s another possible design. We decided that the intrinsics were less intrusive. The problems is that it’s not just FP-specific optimizations that would need to check the attribute, it is also other optimizations doing other kinds of code motion and value propagation. Having IR-level operations that are side-effect-free, except when some special function attribute is present, seems undesirable. -Hal