Question about FMA formation

Hi, Dear All:

    I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than
"a * b + c". I'm wondering if FMA could be less precise. In the former case, can we enable FMA
  formation despite restrictive FP mode?

  Thanks
Shuxin

Hi, Dear All:

   I'm going implement FMA formation. On some architectures, "FMA a, b, c"
is more precise than
"a * b + c".

If it isn't more accurate, it isn't an FMA, at least not in the
commonly used sense. (ARM has an instruction which does a multiply
and add which isn't more precise, but it would just be confusing to
refer to that as an FMA.)

In the former
case, can we enable FMA
formation despite restrictive FP mode?

No. There have already been very long discussions about fma; try
searching the llvmdev archives.

-Eli

Hi, Dear All:

I’m going implement FMA formation. On some architectures, “FMA a, b, c” is more precise than
“a * b + c”. I’m wondering if FMA could be less precise. In the former case, can we enable FMA
formation despite restrictive FP mode?

I believe that a pass to form fmuladd[1] intrinsic calls would be very useful! The fmuladd intrinsic is defined such that its formation should be isolated from worries about strictness. It simply means “a * b + c” and leaves the decision of whether or not to fuse up to the code generator. Of course, one probably would only run your pass if they wanted the code generator to fuse it, but the pass itself should be valid.

Someone please correct me if I misunderstand this intrinsic.

[1] http://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

A little background:

The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form ‘a * b + c’ within a single source statement.

If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from ab+c expressions at isel time. I don’t see any fundamental problem with forming llvm.fmuladd. to model FMA formation opportunities in an IR pass though.

  • Lang.

Hi, Eli, Mike and Lang:

Thank you all for the input. This is one e.g which might be difficult for isel:
ab + cd + e => ab + (cd + e).

Thanks
Shuxin

A little background:

The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form ‘a * b + c’ within a single source statement.

If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from ab+c expressions at isel time. I don’t see any fundamental problem with forming llvm.fmuladd. to model FMA formation opportunities in an IR pass though.

I see. Shuxin, do you know if it’s pretty simple to match FMA style patterns? Is there any advantage to forming them in the IR, e.g. does it allow you to do a post-pass combining or optimization?

One major user of FMA formation at the IR level is fast-isel, which could just match those patterns itself if they’re simple enough and there’s not much subsequent optimization to be had.

Hi, Eli, Mike and Lang:

Thank you all for the input. This is one e.g which might be difficult for isel:
ab + cd + e => ab + (cd + e).

You hit send right when I did!
For your example, do you mean that it’s grouped like:
(fadd (fadd (fmul a b) (fmul c d)) e)

How would your pass go about handling these patterns and is that something that would be too complicated for fast-isel to do on the fly?

You hit send right when I did!

For your example, do you mean that it's grouped like:
(fadd (fadd (fmul a b) (fmul c d)) e)

How would your pass go about handling these patterns and is that something
that would be too complicated for fast-isel to do on the fly?

Depends on how they're grouped, but if the formation happens prior to
codegen then fast-isel will just handle whatever new instruction you've
got. An example of IR would be useful though :slight_smile:

-eric

Right now we’re shying towards having a re-association helper in codegen-prepare that will re-associate expressions (if allowed). This would allow fast-isel to more easily spot FMA opportunities, and form better code.

Why not just form them via a fast IR level pass and just have patterns
match in fast isel instead of trying to form code? Or are we saying the
same thing? (Your words of "fast isel spot"ting and "form better code"
caused me to think you mean to do optimizations within the fast isel pass).

-eric

Sorry, we’ve kind of been jumping around a bit. I’ll try to expound on what’s being debated: We have a few options ahead of us as far as benefitting fast-isel is concerned.

We can write a pass to form fmuladds. The intent being to run this very late, perhaps before or part of codegen prepare. The downside here is that it somewhat goes against the point of fast-isel. Fast-isel allows us to skip extra representations of the program, and replacing IR with intrinsic calls is similar to having an extra representation, albeit only for part of the program.

However, the basic task of spotting an fadd of an fmul is simple enough that fast-isel could just emit the FMA equivalent if it likes. This has the benefit that we avoid the extra representation, but the downside that it makes fast-isel a little more complicated and it only does simple patterns.

Shuxin was showing some more complicated patterns that required re-association to match (fast-math flags permitting). For those, we’re considering if having a re-associate-for-FMA functionality in codegen-prepare would solve that problem. Thus, we can re-associate in codegen-prepare and emit FMA in fast-isel.

Hi Michael, Shuxin,

Shuxin was showing some more complicated patterns that required
re-association to match (fast-math flags permitting). For those, we're
considering if having a re-associate-for-FMA functionality in
codegen-prepare would solve that problem. Thus, we can re-associate in
codegen-prepare and emit FMA in fast-isel.

Yep. I misread the association on Shuxin's example, but even ((a*b) +
(c*d)) + e would match to a 3-instructions:
(fadd (fma a b (fmul c d)) e).

If there are hairier examples that really require reassociation my vote
would be for this last scheme: An FMA-friendly reassociation pass run
before isel that exposes simple patterns for isel to match.

- Lang.

Agreed. I don't think fast isel should be attempting to form any new
patterns.

-eric