I would like to propose that we should honor the settings implied by #pragma STDC FP_CONTRACT [ON|OFF|DEFAULT] and #pragma clang fp contract([on|off|fast]) by default when the -ffp-contract=fast option is used. This would make the -ffp-contract=fast-honor-pragmas option redundant, which I think is good.
I consider the current behavior of ignoring these pragmas to be a bug. I’m not convinced anyone wants that behavior, but if it is needed, we could add something like -ffp-contract=fast-no-honor-pragmas to provide it.
This issue has been raised before, but I’d like to bring it up again because I feel very strongly that the current behavior is incorrect. Here are some useful links for those interested in the history of the problem:
My interpretation of the history is that in the days before we introduced fast-math flags, the backend had to rely on global options from the front end to decide if contraction was allowed, but at some point this bug was interpreted as a feature, so even though we now have the ability to indicate at the instruction level whether contraction is allowed, that setting can still be overridden by the global option. The result is that even though we’ve generated IR that says contraction isn’t allowed, the back end still does it.
If we intended for the pragmas to be ignored, we should be representing that intention in the IR rather than relying on a global option.
In previous discussions, the claim has been made that ignoring the pragmas makes us more compatible with gcc. However, gcc doesn’t support the pragmas, so I don’t consider this a valid point of reference.
I also consider the current behavior to be a bug (though it may be an intentional one). In terms of design, what most users expect is that options set closer to some source code override options set farther away. e.g., when compiling with -fwhatever=ON
// code where whatever is ON
{
#pragma clang whatever OFF
// code where whatever is OFF
if (foo()) {
#pragma clang whatever ON
// code where whatever is ON
}
// code where whatever is OFF
}
// code where whatever is ON
Command line options are the farthest away from the source code you can get. So the command line should set the default behavior, and the pragmas should override based on lexical scopes. The fact that the user needs to set a command line option saying “please honor what the source code says” is pretty weird IMO. Even if this is GCC behavior if you squint hard enough, this seems like a bad design and I’m not convinced we need to replicate it.
That said, is changing the behavior a breaking change?
I would say it is not. We don’t ever guarantee that an operation will be fused. The current behavior enables fusing even if there is a pragma that says not to allow it in some places. If we make the change I suggest, we won’t fuse where the user’s code says not to. That would have been valid behavior even if you interpret -ffp-contract=fast as telling the compiler to ignore pragmas.
The other thing I would like to mention here is that because the current behavior is a result of looking at a global flag from the front end, if you are using LTO, you get whatever behavior is on the command-line for the link step, not what you specified when the original source files were compiled because the pre-link/prepare-for-LTO step doesn’t put anything in the IR to say what setting was used. So if you use -ffp-contract=fast on the pre-link phase, the front end will not set the contract flag on operations within a region where contract is turned off by pragma, then if you don’t specify -ffp-contract=fast on the link step, those operations will not be contracted. Worse, if you use -ffp-contract=off in the pre-link step, but you do use -ffp-contract=fast on the link step, contraction will be enabled everywhere.
Well, it’s more like the user needs to not pass a command line option saying “please ignore what the source code says”. Passing -ffast-math is a choice that the user can take some responsibility for.
Anyway, I believe #pragma float_control does override the global fast-math setting, and I’m not strongly opposed to extending that to more pragmas.
Right. The problem with fp-contract is that pragmas, including float_control, do override the IR that clang generates but the backend handling of FMA checks a target option that gets set with -ffp-contract=fast and generates FMA even when the IR doesn’t say it’s allowed.
In that sense, this is really a back-end bug, but I think it’s in the target-independent CG code, so its shared by most/all backends that support FMA. The reason I brought it up as a clang RFC is that the clang command-line handling has evolved to make this look like intended behavior.
Oh, well, that contradicts our documentation for float_control and so is clearly a bug. If we need the frontend to not set the global fast-math flag and just set it on individual functions that haven’t opted out, that’ll probably be a real pain in terms of test churn, but we can do that. Fixing the backend issue seems to be the simpler approach, but it’s real easy for me to say that.
I think we just need to remove the global option entirely, along with the corresponding fast-honor-pragmas argument to the -ffp-contract option. The IR is fully capable of saying where contraction is and isn’t allowed, and it’s doing that correctly today. If it isn’t enabled in the IR, the backend shouldn’t be doing it.
This may be a complicated transition if non-clang clients are using the TargetOptions rather than putting the semantics in the IR, but we can at least start by having clang stop participating in the problem.
This is the code in clang that sets the option:
// Set FP fusion mode.
switch (LangOpts.getDefaultFPContractMode()) {
case LangOptions::FPM_Off:
// Preserve any contraction performed by the front-end. (Strict performs
// splitting of the muladd intrinsic in the backend.)
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
break;
case LangOptions::FPM_On:
case LangOptions::FPM_FastHonorPragmas:
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
break;
case LangOptions::FPM_Fast:
Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
break;
}
The NVPTX target returns FPM_Fast, but the contract flag is set on individual instructions, so I think this is just an artifact left over from the time before we had a contract flag and NVPTX wanted to enable fusion by default. The FPM_FastHonorPragmas option got introduced because HIP also wanted to enable fusion by default but they wanted to honor pragmas.
I’d like to replace the code above with this:
// Set FP fusion mode.
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
I agree that it feels like NVPTX should get that semantics by just enabling FP contract by default.
Okay, so it sounds like:
We have agreement that, at the Clang level, at least some pragmas should override the global fast-math setting. That seems like it mostly resolves this RFC.
At the LLVM level, we need to figure out the best way to reflect fast-math settings in IR. You’re arguing that having a target configuration flag is dangerously redundant and that we should probably only have function-level (or even just instruction-level?) flags. That sounds reasonable to me, but I think you need to socialize this more directly with other LLVM contributors rather than burying it in an RFC that sounds like it’s just about a Clang frontend issue.
Once we have clarity on the second point, we can have a normal code review thread about the best way to emit that IR in Clang.
I think we can implement the clang part of this without solving the backend problem. I agree that removing the option would require a separate discussion, but clang already generates the IR needed to enable contraction, respecting both command-line options and pragmas.
At the LLVM level, we need to figure out the best way to reflect
fast-math settings in IR. You’re arguing that having a target
configuration flag is dangerously redundant and that we should
probably only have function-level (or even just
instruction-level?) flags. That sounds reasonable to me, but I
think you need to socialize this more directly with other LLVM
contributors rather than burying it in an RFC that sounds like
it’s just about a Clang frontend issue.
There’s long been a tacit agreement that global options is the wrong way
to do things, and even the function-level attributes are frowned upon.
I’ve been (slowly, admittedly) working my way through the fast-math
flags to improve their semantic definitions, hopefully I’ll get the
improved semantics for contract RFC up shortly.
Having looked more closely at what’s happening in a few backends, I have to admit that this isn’t as simple as I thought to just switch in clang. The x86 backend is more or less relying on the target-independent DAGCombiner creating FMA nodes (plus an X86-specific pass). The Aarch64 backend has some pattern matchers that are using the node-based fast-math flags properly. Then NVPTX backend has a lot of pattern matching that is based (for both FMA enabled and FMA disabled) on the state of the global flag. The result is that x86 and Aarch64 still generate FMA with the change I am proposing, but NVPTX doesn’t.
I haven’t looked closely at any other backends, but I think Aarch64 is doing it correctly, and I don’t think any other backend is doing that in the SelectionDAG pattern matching. I don’t know what the state of this is with GlobalISel, but I’ve seen enough to convince me that I need to back off of my request for the clang stuff and instead push for the backends to start respecting what the IR tells them.