[RFC] Eliminating non-IR floating-point controls in the selection DAG

Hi everyone,

This is related to the recent thread about fp-contract and front end pragma controls, but I want to generalize the discussion in terms of how the target-independent codegen in the backend is implemented.

Until sometime in 2017 (I think) the fast-math flags were not passed through to the Selection DAG, and so the only ways to control floating-point behavior were through settings in the TargetOptions or by settings function attributes. Since 2017, however, the fast-math flags have been attached to floating-point nodes in the selection DAG. This leads to some ambiguous situations where the TargetOptions or function attributes can override the absence of fast-math flags on individual nodes. An example of this is the fp-contract setting. If a source file is compiled with clang using the ‘-ffp-contract=fast’ setting but the file contains either “#pragma STDC FP_CONTRACT OFF” or “#pragma clang fp contract(off)” the front end will generate IR without the ‘contract’ fast-math flag set, but the X86 backend will generate an FMA instruction anyway.

https://godbolt.org/z/dov6EcE8G

This is particularly bad in the case of CUDA, because CUDA uses fp-contract=fast by default. So, the user’s code can explicitly say “don’t generate fma here” and the compiler will respond, “meh, I think I will anyway.”

https://godbolt.org/z/c4h1nK9M3

There are other cases where the backend code will check for TargetOption::UnsafeFPMath for things like reassociation that can be represented using fast-math flags.

That brings me to the RFC part of my message. I’d like to start updating the backend so that it doesn’t do things like this. As a general principle, I would say, “All semantics must be represented in the IR and the backend must respect the IR semantics.” And a corollary: “Anything which can be represented at the instruction level must be represented at the instruction level.” This corollary would eliminate potential conflicts between function attributes (like “unsafe-fp-math”) and individual IR instructions.

As a first step toward this goal, I’ve prepared a patch which closes the back door for fp-contract control.

https://reviews.llvm.org/D112760

This patch is currently incomplete, in as much as I didn’t update failing tests for several target architectures. I did update the X86 and AMDGPU tests to provide examples of how they can be made to work. I will fix the rest if we decide this is the correct direction. There is a failing CUDA test in the clang front end that I think will require a different approach involving some driver changes to get clang to generate IR for the semantics it intends rather than setting an option and counting on the backend to disregard the IR.

Thanks in advance for any feedback!

-Andy

Generally, this makes a lot of sense to me. If we have semantics expressible in IR, we should use that as the source of truth. Not really my area, so I don’t have any opinion on the specific implementation plan.

Philip

Hi Andrew,

IIRC, the fp-contract change was to fix some broken code but not break more. Copying Sebastian who was working on that at that time.

I agree we shouldn’t have overriding behaviour flags in the back-end if the IR explicitly says what it wants. But I’d be cautious as to move all such flags to instructions.

First, this would be a behavioural change that needs the IR to change, so would affect every LLVM IR front-end, which makes it a pervasive change throughout the downstream users. So, if we decide we want to do this, we need to replace the current mess with a consistent implementation that wont break everybody else’s.

Second, module/function flags can control fine-grain behaviour without bloating the IR. I don’t know how the instruction flag changes the binary representation, it’s probably very small anyway, but so are the module/function ones, so overall, a definite increase in size.

Finally, I think we need to separate the IR from DAG/MIR behaviour. It seems to me that the target option is what overrides the behaviour, not function/module options, so we should worry about the targets’ behaviour, not at which level the flag is set.

There’s a perfectly valid solution that has module/function/instruction flags controlling behaviour, with the most specific overriding the least specific, and none of that overridden by the target. This means we can still use the same IR flags in the same way (thus not forcing all front-ends to change) and still correct the behaviour by not making the target ignore all that.

Does any of that make any sense?

cheers,
–renato

+1

The FMFs and TargetOptions need to be synergized. The current state of things is confusing. E.g. the FMA matching guards in DAGCombine.

Renato is correct that we must be careful not to impact out-of-tree front ends too much. The recent “dso_local” and “mustprogress/willreturn” changes were painful, so I wouldn’t like to relive those experiences.

Thanks for working on this, Andy!

-Cameron

Hi Renato,

Thanks for the reply.

I’m not entirely clear what course of action you are suggesting. I’ll start with the two statements I didn’t understand.

Finally, I think we need to separate the IR from DAG/MIR behaviour. It seems to me that the target option is what overrides the behaviour, not function/module options, so we should worry about the targets’ behaviour, not at which level the flag is

I don’t know what you’re saying here. What do you mean by “we should worry about the targets’ behaviour”? If the IR says fp-contract is not allowed for a given instruction, why should the target options be allowed to overrule that?

There’s a perfectly valid solution that has module/function/instruction flags controlling behaviour, with the most specific overriding the least specific, and none of that overridden by the target. This means we can still use the same IR flags in the same way (thus not forcing all front-ends to change) and still correct the behaviour by not making the target ignore all that.

I have a bit of an idea of what you’re getting at here, but I’d like to ask for a clarification. I think you’re saying that an instruction flag would take precedence over a function attribute, and a function attribute would take precedence over module metadata, right? The question I would have is whether we are going to view the absence of a flag/attribute/metadata as meaningful.

For example, given the rule you propose, if an instruction has the ‘reassoc’ fast-math flag set, but the function has an attribute “unsafe-fp-math”=“false”, then we would permit that instruction to be reassociated. But if the instruction does not have any fast-math flags set and the function has an attribute “unsafe-fp-math”=“true”, does the flag not being set at the instruction level take precedence over the function attribute? I suppose to maintain the current behavior it would have to. I was a bit surprised to discover that clang handles the “unsafe-fp-math” math function attribute correctly for this. If I use -ffast-math on the command line, but disable one of the fast-math flags for part of a function, it will set “unsafe-fp-math” to “false”.

https://godbolt.org/z/Mj88nbMq1

In this case, clang will set TargetOptions::UnsafeFPMath to true, but SelectionDAGISel::runOnMachineFunction() calls TargetMachine::resetTargetOptions() which resets it based on the function attribute. So for this case, it all works out.

Two notes, however:

Finally, I think we need to separate the IR from DAG/MIR behaviour. It seems to me that the target option is what overrides the behaviour, not function/module options, so we should worry about the targets’ behaviour, not at which level the flag is

I don’t know what you’re saying here. What do you mean by “we should worry about the targets’ behaviour”? If the IR says fp-contract is not allowed for a given instruction, why should the target options be allowed to overrule that?

They shouldn’t. Unless it’s a hardware-specific design issue (ex. there is no fused op, or there is no un-fused op), but that should be done at the DAG level, I think, not IR level.

Two notes, however:

  1. resetTargetOptions doesn’t handle the AllowFPOpFusion setting (as far as I can tell, there is no corresponding function attribute)
  2. There is a comment for resetTargetOptions which says this:

// FIXME: This function needs to go away for a number of reasons:

// a) global state on the TargetMachine is terrible in general,

// b) these target options should be passed only on the function

// and not on the TargetMachine (via TargetOptions) at all.

As you may guess, I completely agree with the comment. FWIW, this comment was added by Eric Christopher in 2015 (https://github.com/llvm/llvm-project/commit/35a8a62125ccfa0ef804584738a295ed5c17750f).

I think the biggest problem there is that this is a target flag not an IR flag and that’s not a good design. I completely agree with the comment also.

I also agree the dance of inst/func/mod level flags isn’t trivial and it a source of at least confusion, but potentially bugs.

I also want to clarify what I meant for “perfectly valid”, and that was “it solves the problem and is technically doable”. I didn’t want to imply it was good or there wasn’t a better way of doing it.

The core of my complaint about the current state of things is that the backend is treating the fast-math flags as if they are “on or undecided” rather than “on or off.” IMO, the absence of a fast-math flag in the IR means that the behavior controlled by that flag is not permitted. That’s the way the IR is treated in IR-level optimization passes, but the backend codegen (at least in places) behaves as though the absence of a fast-math flag means “not permitted unless enabled by TargetOptions.” That’s bad as a starting point, but it’s particularly bad when you start linking together IR from multiple modules that may have been created from different compilation units with different options.

I absolutely agree with you. That’s why I’ve added Sebastian, who worked on a very similar problem not long ago. I myself have fixed similar problems in the Arm backend a long time ago.

I think we really should treat the lack of fast-math-like flags to mean their behaviour is forbidden, and force front-ends and middle-end passes to add them if the user requested / allowed.

I’d also like to make one thing explicit. While the change I’m proposing would change the behavior of the backend for all front ends, it wouldn’t “break” them in the sense of producing incorrect results. When TargetOptions::AllowFPOpFusion==FPOpFusionMode::Fast the backend is allowed to fuse operations, but it isn’t required or guaranteed to fuse them. With the change I am proposing, the backend wouldn’t form fuse FP options unless the ‘contract’ flag were set, but that wouldn’t lead to incorrect results. It would lead to lost performance in some cases, but I think that can be recovered by generating IR with the semantics that were intended. Similarly, for the other fast-math flags. However, I will admit that it could be an uncomfortable transition for some front ends.

It’s not that simple. We want to keep the performance for existing code with existing compilation infrastructure as much as possible.

When an existing project with its existing compilation flags use a new version of clang and suddenly the performance is considerably lower, they may start to look for things to change or may just blame on “new compiler” and try other compilers, or worse, get stuck with an old version of clang.

But this is also not a huge deal, and we can fix that with more clear communication…

Regarding bloating the IR, I’m not sure if this is a theoretical problem or real one. The bitcode writer will emit a byte for the fast-math flags if any are set for each call or FP operation, and if none are set it will not emit that byte. For clang, this is a non-issue because clang always sets the fast-math flags on the individual operations. It’s at least theoretically possible that there is some front end that never sets the fast-math flags and relies on the global setting. In that case, adding the fast-math flags would bloat the IR considerably. For most of the FMF, omitting the flags would cause considerable missed optimization opportunities in the middle end, but the ‘contract’ flag is a bit of a special case because it isn’t typically used at all before codegen.

My comment on the increase in size was meant as a check, not as a blocker. It could be well within noise levels and totally worth the change.

cheers,
–renato

I’d like to +1 for the proposal. It is annoying when doing optimization with FMF in backend due to the counterintuitive fact that target option overridden the instruction attributes.

But please take care to the existing usage in backends. Fortunately, there are not too many :slight_smile:

$ grep -rwn “Options.AllowFPOpFusion” llvm/lib/Target/ |wc -l

12

$ grep -rwn “Options.UnsafeFPMath” llvm/lib/Target/ |wc -l

26

Regarding the multi front-ends compatibility issue, I have an idea.

We can write a middle-end pass and add it at the very beginning of the pipeline. The pass “translates” the IR FMFs from “non FMF flags (undetermined)” to “non FMF flags (determined)” or “some FMF flags” according to the module/function attributes. We can add a default by false option by which front ends like Clang etc. can disable it once they respect the IR FMFs strictly.

I think the use of target option simply owe to the fact that DAG and other backend passes always lose the FMFs during transformation in the past.

Thanks

Phoebe (Pengfei)

I think the use of target option simply owe to the fact that DAG and other backend passes always lose the FMFs during transformation in the past.

Ah! That’s an interesting piece of information! And exacerbates Eric’s original comment.

At which level is that information lost?

SelectionDAG has many overloaded methods of “getNode”, some of which don’t need to specify the Flags argument. This is reasonable because only FP nodes need that. But it also easily results in losing the Flags in the FP nodes too.

Here’s an example https://reviews.llvm.org/D84518

This problem has been greatly improved since https://reviews.llvm.org/D87361

I think we have the similar problem in MIR based optimizations, but I didn’t dig into it.

Thanks

Phoebe (Pengfei)

SelectionDAG has many overloaded methods of “getNode”, some of which don’t need to specify the Flags argument. This is reasonable because only FP nodes need that. But it also easily results in losing the Flags in the FP nodes too.

Here’s an example https://reviews.llvm.org/D84518

This problem has been greatly improved since https://reviews.llvm.org/D87361

I think we have the similar problem in MIR based optimizations, but I didn’t dig into it.

This sounds like a somewhat large, but mostly mechanical problem to solve, correct?

Once we have a way to propagate the flags down to instruction selection, then we don’t need the target info overriding IR semantics.

Andrew, is that what you’re proposing originally? (Sorry if I’m slow to catch, I worried about changing the IR for front-ends but it seems it was misplaced).

As far as I understand it, they are two different things.

Backends used to use target option due to some mechanical problem. But backends strictly respect the FMFs in instructions. This is also the reason why they use target options that way.

Frontends don’t strictly respect the FMFs in instructions, so if there’s no FMFs in instructions, the FMFs are determined by module/function attributes.

I think this is the problem Andy wants to solve.

I didn’t learn it much, forgive me if I understand something wrong here :slight_smile:

Thanks

Phoebe (Pengfei)

I thought the target options were older than FMF. I don’t think they were created as a workaround for a shortcoming of FMF. Though that may be what they’ve become.

~Craig

This is what I read from Pengfei’s reply. Target options are being used that way, not created that way.

–renato

I agree that we should remove the target options as an override for FMF.
But we should be aware of a current shortcoming in IR/node-level FMF. Here’s a simple recent example:
https://llvm.org/PR52259

When compiled with -fno-signed-zeros (“nsz”), we want this to be “llvm.fabs” in IR:

double floatingAbs(double x) {
  if (x < 0) return -x;
  return x;
}

But it doesn’t work because the application of FMF is incomplete (or the FMF design is broken?) - we don’t have FMF on function arguments or loaded FP values.

This patch proposes to solve that example by using function-level attributes:
https://reviews.llvm.org/D112283

IIUC, removing the target options still allows for the function attribute override to FMF, so we will still have a potential work-around for the holes in FMF design/implementation.

Hi everyone,

This is related to the recent thread about fp-contract and front end pragma controls, but I want to generalize the discussion in terms of how the target-independent codegen in the backend is implemented.

Until sometime in 2017 (I think) the fast-math flags were not passed through to the Selection DAG, and so the only ways to control floating-point behavior were through settings in the TargetOptions or by settings function attributes. Since 2017, however, the fast-math flags have been attached to floating-point nodes in the selection DAG. This leads to some ambiguous situations where the TargetOptions or function attributes can override the absence of fast-math flags on individual nodes. An example of this is the fp-contract setting. If a source file is compiled with clang using the ‘-ffp-contract=fast’ setting but the file contains either “#pragma STDC FP_CONTRACT OFF” or “#pragma clang fp contract(off)” the front end will generate IR without the ‘contract’ fast-math flag set, but the X86 backend will generate an FMA instruction anyway.

https://godbolt.org/z/dov6EcE8G

This is particularly bad in the case of CUDA, because CUDA uses fp-contract=fast by default. So, the user’s code can explicitly say “don’t generate fma here” and the compiler will respond, “meh, I think I will anyway.”

https://godbolt.org/z/c4h1nK9M3

I don’t think the argument about -ffp-contract is directly applicable to the implementation-level decision here. For better or worse, we can maintain the existing -ffp-contract semantics on top of any representation, e.g. by just marking all FP instructions as fast-contractable and dropping all the IR that would come from pragmas. Your core point is that the current representation allows unnecessary ambiguities that lead to bugs, and that seems pretty indisputable.

There are other cases where the backend code will check for TargetOption::UnsafeFPMath for things like reassociation that can be represented using fast-math flags.

That brings me to the RFC part of my message. I’d like to start updating the backend so that it doesn’t do things like this. As a general principle, I would say, “All semantics must be represented in the IR and the backend must respect the IR semantics.” And a corollary: “Anything which can be represented at the instruction level must be represented at the instruction level.” This corollary would eliminate potential conflicts between function attributes (like “unsafe-fp-math”) and individual IR instructions.

It’s unclear to me whether you’re proposing this as a rule just for the Selection DAG or also for LLVM IR. Selection DAG is an internal IR of the backends, and strengthening rules there doesn’t have very many downsides. If you’re also proposing that LLVM IR should represent these with instruction-level rather than function-level flags, that’s going to be a lot more disruptive for frontends, because most frontends don’t need to support things like local pragmas and may not be setting instruction-level flags right now. That doesn’t mean we can’t still pursue that, but your proposal should be clear that that’s the goal.

As a first step toward this goal, I’ve prepared a patch which closes the back door for fp-contract control.

https://reviews.llvm.org/D112760

This patch is currently incomplete, in as much as I didn’t update failing tests for several target architectures. I did update the X86 and AMDGPU tests to provide examples of how they can be made to work. I will fix the rest if we decide this is the correct direction. There is a failing CUDA test in the clang front end that I think will require a different approach involving some driver changes to get clang to generate IR for the semantics it intends rather than setting an option and counting on the backend to disregard the IR.

Yeah, Clang shouldn’t be producing inconsistent IR like that.

John.

From the viewpoint of optimization, it is important to allow internal representation in which different instructions may have different properties including fast-math flags. The proposed general principle looks very attractive, as it leads to flexible representation, which is also can be safer as properties are queried from instruction only and not a combination instruction+function+module.

It is not mandatory to keep flags in the instruction, maybe an instruction method that returns a set of flags would be sufficient. Such a method would calculate the flags if they are not kept in instruction. It could help solving problems like https://llvm.org/PR52259, as having fast-math flags on instructions like phi or select (and probably some others) does not look natural.

Thanks,
–Serge

Andrew, is that what you’re proposing originally? (Sorry if I’m slow to catch, I worried about changing the IR for front-ends but it seems it was misplaced).

If by “that” you mean “relying on the SDNode fast-math flags and not using the target info”, then yes.

I wasn’t aware of the problem that Phoebe brought up with regards to the fast-math flags being dropped. I also haven’t looked to see what is happening in GlobalISel. So, I may be underestimating the amount of work to make this transition. Getting the existing LIT tests to pass doesn’t seem to be a problem. Avoiding all performance regressions in real-world code could be. It may be possible to write a verifier-like utility pass that would try to detect when the flags were lost, but I haven’t thought that through.

As for the transition plan for out-of-tree front ends, I was thinking of something very similar to what Phoebe suggested in her most recent reply on this thread, but I was thinking of having the “fixup” pass run during the codegen prepare phase. The way I envision it would be this:

I wasn’t aware of the problem that Phoebe brought up with regards to the fast-math flags being dropped. I also haven’t looked to see what is happening in GlobalISel. So, I may be underestimating the amount of work to make this transition.

I’m not sure FastISel is still used, but if it is, we can’t break it either (before fully deprecating it, at least).

Getting the existing LIT tests to pass doesn’t seem to be a problem. Avoiding all performance regressions in real-world code could be.

LIT tests only cover upstream IR generation. There’s an additional item there: warn downstream front-ends to check their IR with the flag set.

It may be possible to write a verifier-like utility pass that would try to detect when the flags were lost, but I haven’t thought that through.

That’d possibly need caching instructions that do have it. Perhaps easier to add a library (not a pass) to compare before and after for any transformation pass that is about to replace-all-users-with.

As for the transition plan for out-of-tree front ends, I was thinking of something very similar to what Phoebe suggested in her most recent reply on this thread, but I was thinking of having the “fixup” pass run during the codegen prepare phase. The way I envision it would be this:

SGTM.

cheers,
–renato