Proposal: New DAG node type for reciprocal operation

Hi,

In relaxed/fast math mode, if we can convert a/b to a * (1/b), we may get more performance when (1) “b” is loop invariant or (2) arch has faster reciprocal instruction (e.g. recipe/recips on ARM) or (3) arch has no vector div, but has vector mul and recip.

So ,with this node type, a div node can be converted to a mul and a recip when desired. Then, each arch can further lower the recip node. Even if the arch has no recip support, allowing other passes to hoist “1/b” out of loop may still be profitable.

It this feasible?

Thanks,

Weiming

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

Sounds like a reasonable fit for a target-specific DAG combine. I suspect a target specific node wouldn't be necessary and the patterns could be matched directly.

-Jim



— On Thu, 9/20/12, Jim Grosbach grosbach@apple.com wrote:


> From: Jim Grosbach grosbach@apple.com
> Subject: Re: [LLVMdev] Proposal: New DAG node type for reciprocal operation
> To: “Weiming Zhao” weimingz@codeaurora.org
> Cc: llvmdev@cs.uiuc.edu
> Date: Thursday, September 20, 2012, 3:32 PM
>
> Sounds like a reasonable fit for a target-specific DAG combine. I suspect a target specific node wouldn’t be necessary and the patterns could be matched directly.
>
> -Jim
>
> Yes, a target specific node is not necessary, direct pattern matching would be enough for the required transformation.Having reciprocal node may also give opportunity for other target specific transformation.
>
> -Shahid
>
> On Sep 20, 2012, at 3:26 PM, Weiming Zhao weimingz@codeaurora.org wrote:
>
> > Hi,
> >
> > In relaxed/fast math mode, if we can convert a/b to a * (1/b), we may get more performance when (1) “b” is loop invariant or (2) arch has faster reciprocal instruction (e.g. recipe/recips on ARM) or (3) arch has no vector div, but has vector mul and recip.
> >
> > So ,with this node type, a div node can be converted to a mul and a recip when desired. Then, each arch can further lower the recip node. Even if the arch has no recip support, allowing other passes to hoist “1/b” out of loop may still be profitable.
> >
> > It this feasible?
> >
> > Thanks,
> > Weiming
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> -----Inline Attachment Follows-----
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

|

Yes, what I mean is a target independent node in the ISD::NodeType enum.

I already did the node transformation DAGCombiner and target-specific lowering in the first place. It worked. But introducing a specific node will make the logic more clear.

For example, in ARM, FDIV is a scalar operation. So, after DAGCombiner and Vector Type legalize, vectorized FDIV has been expanded into scalar versions, which breaks the intention of utilizaing vectorizable mul/recip to implement a vectorized fdiv. To fix that, one need to either combine them back or change the logic of vector type legalize.

Thanks,

Weiming

To fix that, one need to either combine them back or change the logic of vector > type legalize.

Combining them back is simple, however if a scalar operation has combined & vectorized it should not have been expanded into scalar.So changing the logic of vector type legalize seems better solution.

-Shahid