Is r174746 broken on ARM?

Hello Hal,

I have a strong suspicion that your constant folding optimization
introduced at r174746 is broken on ARM. There is a bug about it:

http://llvm.org/bugs/show_bug.cgi?id=15581

There is no such issue with 3.2, and reverting r174746 on top of
r178740 also fixes the problem. I'm trying to fix it myself, but
still have no good ideas; so it would be great to have an advice
from you.

Thanks in advance,
Dmitry

From: "Dmitry Antipov" <antipov@dev.rtsoft.ru>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Renato Golin" <renato.golin@linaro.org>, llvmdev@cs.uiuc.edu
Sent: Thursday, April 4, 2013 3:22:05 AM
Subject: Is r174746 broken on ARM?

Hello Hal,

I have a strong suspicion that your constant folding optimization
introduced at r174746 is broken on ARM. There is a bug about it:

http://llvm.org/bugs/show_bug.cgi?id=15581

There is no such issue with 3.2, and reverting r174746 on top of
r178740 also fixes the problem. I'm trying to fix it myself, but
still have no good ideas; so it would be great to have an advice
from you.

r174746 is specifically related to pre-increment loads and stores. I think that the first step is to narrow down the problematic case. In DAGCombiner::CombineToPreIndexedLoadStore, there is a loop which starts with:

  SmallVector<SDNode *, 16> OtherUses;
  if (isa<ConstantSDNode>(Offset))
    for (SDNode::use_iterator I = BasePtr.getNode()->use_begin(),
         E = BasePtr.getNode()->use_end(); I != E; ++I) {
      SDNode *Use = *I;

1. Make the loop skip cases where Use->getOpcode() == ISD::ADD and then Use->getOpcode() == ISD::SUB and try to figure out whether the problem is related to folding with ADDs or SUBs.

2. In the problematic case, skip the loop when AM == ISD::PRE_DEC, and see if the problem is related to pre-increment or pre-decrement.

Looking briefly at the code in comment 5 of PR15581, is that the pre-decrement case? I can't test that case on PPC, so I can certainly believe that there is a problem somewhere. The relevant code is a little farther down:

    APInt OV =
      cast<ConstantSDNode>(Offset)->getAPIntValue();
    if (AM == ISD::PRE_DEC)
      OV = -OV;

    ConstantSDNode *CN =
      cast<ConstantSDNode>(OtherUses[i]->getOperand(OffsetIdx));
    APInt CNV = CN->getAPIntValue();
    if (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 1)
      CNV += OV;
    else
      CNV -= OV;

perhaps something here is not quite right.

-Hal

I suspect that the first snippet (where OV is inverted) is wrong because
ARM implementation of getPreIndexedAddressParts inverts Offset for
pre-decrement case, both for ARM and Thumb2, in getARMIndexedAddressParts
and getT2IndexedAddressParts, respectively, in a calls to:

Offset = DAG.getConstant(-RHSC, RHS->getValueType(0));
                          ^^^^^
                          here!

So you don't need to invert it one more time in DAGCombiner::CombineToPreIndexedLoadStore.

Dmitry

From: "Dmitry Antipov" <antipov@dev.rtsoft.ru>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Renato Golin" <renato.golin@linaro.org>, llvmdev@cs.uiuc.edu
Sent: Monday, April 8, 2013 2:45:20 AM
Subject: Re: Is r174746 broken on ARM?

> Looking briefly at the code in comment 5 of PR15581, is that the
> pre-decrement case?
> I can't test that case on PPC, so I can certainly believe that
> there is a problem somewhere.
> The relevant code is a little farther down:
>
> APInt OV =
> cast<ConstantSDNode>(Offset)->getAPIntValue();
> if (AM == ISD::PRE_DEC)
> OV = -OV;
>
> ConstantSDNode *CN =
> cast<ConstantSDNode>(OtherUses[i]->getOperand(OffsetIdx));
> APInt CNV = CN->getAPIntValue();
> if (OtherUses[i]->getOpcode() == ISD::SUB && OffsetIdx == 1)
> CNV += OV;
> else
> CNV -= OV;
>
> perhaps something here is not quite right.

I suspect that the first snippet (where OV is inverted) is wrong
because
ARM implementation of getPreIndexedAddressParts inverts Offset for
pre-decrement case, both for ARM and Thumb2, in
getARMIndexedAddressParts
and getT2IndexedAddressParts, respectively, in a calls to:

Offset = DAG.getConstant(-RHSC, RHS->getValueType(0));
                          ^^^^^
                          here!

So you don't need to invert it one more time in
DAGCombiner::CombineToPreIndexedLoadStore.

Okay, great! Can you please generate a test case? In case you don't know, an easy way is this: place an assert that will crash the code generation in the pre-dec case you've highlighted, then use bugpoint to produce a reduced test case for the crash.

Thanks again,
Hal