[RFC] Improving integer divide optimization (related to D12082)

Hello LLVM, A recent commit creates the isIntDivCheap() target query.

http://reviews.llvm.org/D12082

The current approach has a couple shortcomings.

First, when targets decide divide is cheap, the DAGCombiner ignores
obvious power-of-2 optimizations. In the targets I know, shifts are
cheaper than divides in both speed and size. The target cannot see
the value in the isIntDivCheap() call, so cannot itself check for
power-of-2 cases.

Second, isIntDivCheap passes a MinSize flag to the target, which is
true for -Oz (MinSize function attribute). However, the MinSize flags
is false for -Os (OptimizeSize function attribute). Both of these try
to optimize for size and targets may like to indicate cheap division
in both cases, or perhaps for other function attributes in the future.

I think some improvements would be helpful:
The DAGCombiner should always optimize for power-of-2 division before
considering the general integer case.
isIntDivCheap() should pass the MachineFunction to the target rather
than just a MinSize flag. If it cares, the target can check function
attributes as it sees fit.

Does this sound sensible?

Thanks,
-steve

From what I remember, udiv by power of 2 already gets turned into a shift in instcombine; the tricky case is sdiv by power of 2, which takes significantly more than one instruction. The generic implementation is this:

// Splat the sign bit into the register
SDValue SGN =
DAG.getNode(ISD::SRA, DL, VT, N0,
DAG.getConstant(VT.getScalarSizeInBits() - 1, DL,
getShiftAmountTy(N0.getValueType())));
AddToWorklist(SGN.getNode());

// Add (N0 < 0) ? abs2 - 1 : 0;
SDValue SRL =
DAG.getNode(ISD::SRL, DL, VT, SGN,
DAG.getConstant(VT.getScalarSizeInBits() - lg2, DL,
getShiftAmountTy(SGN.getValueType())));
SDValue ADD = DAG.getNode(ISD::ADD, DL, VT, N0, SRL);
AddToWorklist(SRL.getNode());
AddToWorklist(ADD.getNode()); // Divide by pow2
SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, ADD,
DAG.getConstant(lg2, DL,
getShiftAmountTy(ADD.getValueType())));

And there’s already a target-specific hook to add a custom implementation (e.g. on PowerPC):

// Target-specific implementation of sdiv x, pow2.
if (SDValue Res = BuildSDIVPow2(N))
return Res;

-— escha

Thanks escha. I fooled myself and all the offending cases in my
target were indeed SDIVs.

Isn’t the problem the fact that the patch makes it harder for a target to get the generic code to reach its custom hook?
Now the "cheap pow2 sdiv” is merged with the generic “cheap div” you can’t distinguish anymore.

Yes and also the issue of needing more information to make a smart
isIntDivCheap() decision.

In visitSDIV(), checking looks like this when the denominator is a power of 2.

    if (TLI.isIntDivCheap(N->getValueType(0), MinSize))
      return SDValue();

    // Target-specific implementation of sdiv x, pow2.
    if (SDValue Res = BuildSDIVPow2(N))
      return Res;

How about this for isIntDivCheap?
1) pass Function* to allow the target to decide for itself which
attributes matter, and
2) pass a boolean indicating a power-of-2 denominator.

    const Function* Fn = DAG.getMachineFunction().getFunction();
    if (TLI.isIntDivCheap(N->getValueType(0), Fn, true))
      return SDValue();

    // Target-specific implementation of sdiv x, pow2.
    if (SDValue Res = BuildSDIVPow2(N))
      return Res;

Isn’t the problem the fact that the patch makes it harder for a target to
get the generic code to reach its custom hook?
Now the "cheap pow2 sdiv” is merged with the generic “cheap div” you can’t
distinguish anymore.

Yes and also the issue of needing more information to make a smart
isIntDivCheap() decision.

In visitSDIV(), checking looks like this when the denominator is a power of 2.

   if (TLI.isIntDivCheap(N->getValueType(0), MinSize))
     return SDValue();

   // Target-specific implementation of sdiv x, pow2.
   if (SDValue Res = BuildSDIVPow2(N))
     return Res;

How about this for isIntDivCheap?
1) pass Function* to allow the target to decide for itself which
attributes matter, and

If you want the attributes, I think you should pass the attributes and not the whole Function.

I’m not sure why MinSize does not trigger with -Os by default, Michael is it intended?

2) pass a boolean indicating a power-of-2 denominator.

   const Function* Fn = DAG.getMachineFunction().getFunction();
   if (TLI.isIntDivCheap(N->getValueType(0), Fn, true))
     return SDValue();

   // Target-specific implementation of sdiv x, pow2.
   if (SDValue Res = BuildSDIVPow2(N))
     return Res;

Did you consider what I suggested, i.e. inverting the order of the two tests?

Thinking about it more, I even think than removing the first test entirely would even be nicer here.
The default implementation for BuildSDIVPow2() would be:

virtual SDValue BuildSDIVPow2(SDValue N) {
   bool MinSize = …. // choose a default, target can override anyway
   if (TLI.isIntDivCheap(N->getValueType(0), MinSize))
     return N;
   return SDValue();
}

I believe this would preserve the current default behavior and leave targets the ability to override as much as they want.

I think -Oz might be MinSize? -Oz being absolute minimum size at the cost of significant speed, IIRC.

—escha

Sure, the question is more what is the tradeoff for -Os.My remember of -Os is somehow “don’t perform transformation that increases code size”.
It seems that it is what is done here: converting an (possibly expensive) division into multiple (less expensive) operations, probably increasing code size (target dependent admittedly).

If you want the attributes, I think you should pass the attributes and not the whole Function.

OK.

Did you consider what I suggested, i.e. inverting the order of the two tests?

I don't see this suggestion in our thread.

Thinking about it more, I even think than removing the first test entirely would even be nicer here.
The default implementation for BuildSDIVPow2() would be:

virtual SDValue BuildSDIVPow2(SDValue N) {
   bool MinSize = …. // choose a default, target can override anyway
   if (TLI.isIntDivCheap(N->getValueType(0), MinSize))
     return N;
   return SDValue();
}

I believe this would preserve the current default behavior and leave targets the ability to override as much as they want.

Nice. visitSDIV() and visitUDIV() still call isIntDivCheap() for
non-power-of-2 cases. Passing attributes to isIntDivCheap() to enable
a smart check is still desirable, but you eliminated the need for a
power-of-2 flag.

Today's TargetLowering default:

virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
                              SelectionDAG &DAG,
                              std::vector<SDNode *> *Created) const {
  return SDValue();
}

New default with your approach and Fn attributes:

virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
                              SelectionDAG &DAG,
                              std::vector<SDNode *> *Created) const {

   AttributeSet Attr = DAG.getMachineFunction().getFunction()->getAttributes();
   if (TLI.isIntDivCheap(N->getValueType(0), Attr))
     return N;
   return SDValue();
}

virtual bool isIntDivCheap(EVT VT, AttributeSet Attr) const {
  return false;
}

X86 is the only in-tree user, so to preserve today's behavior:

bool X86TargetLowering::isIntDivCheap(EVT VT, AttributeSet Attr) const {
  bool OptSize = Attr.hasAttribute(AttributeSet::FunctionIndex,
Attribute::MinSize)
  return OptSize && !VT.isVector();
}

The way I understand -Os is “don’t increase size unless there’s a clear and significant performance gain” – that is, much less aggressive in terms of code size than the definition you gave below.

I’m not advocating this point of view, mind you, it’s just that this has been my understanding.

Since leaving the div as-is can be very expensive compared to the alternative sequence, I thought -Oz was the safe way to go with this.

Sounds reasonable and the proposed patch keeps -Oz the default.