llvm combines "ADD frameindex, constant" to OR

Hi all,

I've been working on a backend of our architecture and noticed llvm performs
following combining although one of operands is FrameIndex.

    Combining: t114: i64 = add FrameIndex:i64<0>, Constant:i64<56>
    Creating new node: t121: i64 = or FrameIndex:i64<0>, Constant:i64<56>
     ... into: t121: i64 = or FrameIndex:i64<0>, Constant:i64<56>

This caused problem if frame pointer points 0x60000038 at run time.

I checked DAGCombiner::visitADD. It folds ADD to OR by following code
without considering about FrameIndex. This haveNoCommonBitsSet says
it's safe since FrameIndex(0) is 0.

      // fold (a+b) -> (a|b) iff a and b share no bits.
      if ((!LegalOperations || TLI.isOperationLegal(ISD::OR, VT)) &&
          DAG.haveNoCommonBitsSet(N0, N1))
        return DAG.getNode(ISD::OR, DL, VT, N0, N1);

I checked visitADD more and find that it performs some kind of undo
like bellow if the input is (+ (+ FI <const>) <const>).

        // Undo the add -> or combine to merge constant offsets from a frame index.
        if (N0.getOpcode() == ISD::OR &&
            isa<FrameIndexSDNode>(N0.getOperand(0)) &&
            isa<ConstantSDNode>(N0.getOperand(1)) &&
            DAG.haveNoCommonBitsSet(N0.getOperand(0), N0.getOperand(1))) {
          SDValue Add0 = DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(1));
          return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);
        }

I think this code may work fine on an architecture using FI + A + B.

However, our architecture seems to use only FI + A and this ADD is
converted to OR permanently. As a result, generated code use OR and
it causes run time error depends on the address of frame pointer.

I also checked visitOR, but I cannot find any undoing there.

My question is:

  1. Is there any way to control this to avoid OR folding,
     like TargetLowering::preferNotCombineToOrFrameIndex?
     (or should we add new?)
  2. How to optimize FrameIndex? I desire DAGCombiner use alignment
     information on data, stack frame, or something.

Best Regards,
-- Kazushi

Hi,

Hi all,

I've been working on a backend of our architecture and noticed llvm performs
following combining although one of operands is FrameIndex.

    Combining: t114: i64 = add FrameIndex:i64<0>, Constant:i64<56>
    Creating new node: t121: i64 = or FrameIndex:i64<0>, Constant:i64<56>
     ... into: t121: i64 = or FrameIndex:i64<0>, Constant:i64<56>

This caused problem if frame pointer points 0x60000038 at run time.

I checked DAGCombiner::visitADD. It folds ADD to OR by following code
without considering about FrameIndex. This haveNoCommonBitsSet says
it's safe since FrameIndex(0) is 0.

This sounds like your fundamental problem to me. The generic code path
is computeKnownBits -> computeKnownBitsForFrameIndex ->
InferPtrAlignment. If your stack slot can legitimately end up at an
offset only aligned to 0x8 then that's going wrong somewhere and you
should probably work out why.

I don't really have the details to say for sure, but unless you've
overridden computeKnownBitsForFrameIndex a reasonably likely scenario
is that the alloca for that local variable requested 16-bit alignment
but your sp itself is only routinely aligned to 8 bytes (or less) and
you actually need to realign sp manually in XYZFrameLowering to
support that variable.

The ISD::OR thing can be annoying, but shouldn't affect correctness.

        // Undo the add -> or combine to merge constant offsets from a frame index.
        if (N0.getOpcode() == ISD::OR &&
            isa<FrameIndexSDNode>(N0.getOperand(0)) &&
            isa<ConstantSDNode>(N0.getOperand(1)) &&
            DAG.haveNoCommonBitsSet(N0.getOperand(0), N0.getOperand(1))) {
          SDValue Add0 = DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(1));
          return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);
        }

I think this code may work fine on an architecture using FI + A + B.

It's target independent, and again shouldn't affect correctness. The
entire transformation is (FI | A) + B -> (FI + (A+B)). Useful when it
triggers, but not much else.

  1. Is there any way to control this to avoid OR folding,
     like TargetLowering::preferNotCombineToOrFrameIndex?
     (or should we add new?)

There isn't as far as I know. I've heard it justified as
canonicalization before, and not been entirely convinced myself. But
it's generally pretty easy to cope with in the backend because
SelectionDAG::isBaseWithConstantOffset deals with it correctly.

  2. How to optimize FrameIndex? I desire DAGCombiner use alignment
     information on data, stack frame, or something.

As I said above, I think it should already use that and something has
gone wrong there in your case.

Cheers.

Tim.

Hi Tim,

Thank you very much for explaining the computeKnownBitsForFrameIndex
and related functions. I even don't notice them. As you say, I think
my backend has fundamental problems at that area. I'll check it out.

> 2. How to optimize FrameIndex? I desire DAGCombiner use alignment
> information on data, stack frame, or something.

As I said above, I think it should already use that and something has
gone wrong there in your case.

That's true. The computeKnownBitsForFrameIndex and InferPtrAlignment
already use that. It won't work for my case, though.

For example, a vector sized data like v8sd says 64 bytes alignment in
InferPtrAlignment, although I allocate v8sd using 8 bytes alignment on
stack. This differences have caused the problem. I'll check details.

Best Regards,
-- Kazushi

Hi Tim,

I could fix this problem by defining alignments of relatively small vector data types
in DataLayout. I defined the alignment for vector data types which use whole vector register,
but I forgot to define it for small vector types. It caused problems on vector tests in
test-suite. Thank you so much.

Thanks,
-- Kazushi