SelectionDAG legality: isel creating cycles

I've run into a situation in isel where it seems like the selector is
generating a cycle in the DAG.

I have something like this:

0x215f140: v2f64 = llvm.x86.sse2.min.sd 0x215efd0, 0x21606d0, 0x215eb80
[0] 0x215efd0: i64 = Constant <647>
[0] 0x21606d0: v2f64 = scalar_to_vector 0x213b8f0
  [0] 0x213b8f0: f64,ch = load 0x213b780, 0x213aa90, 0x213b610 <0x2113690:0>
alignment=8
    [0] 0x213b780: ch = Prefetch 0x213aee0:1, 0x213b1c0, 0x213b330, 0x213b4a0
      [1] 0x213aee0: f64,ch = load 0x213a720, 0x213ac00, 0x213b610
<0x215ace8:0> alignment=8
        [0] 0x213a720: ch = EntryToken
        [0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
          [0] 0x213a720: ch = EntryToken
          [0] 0x213ad70: i64 = Register #1024
        [0] 0x213b610: i64 = undef
      [0] 0x213b1c0: i64 = add 0x213ac00, 0x213b050
        [0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
          [0] 0x213a720: ch = EntryToken
          [0] 0x213ad70: i64 = Register #1024
        [0] 0x213b050: i64 = Constant <136>
      [0] 0x213b330: i32 = Constant <0>
      [0] 0x213b4a0: i32 = Constant <3>
    [0] 0x213aa90: i64 = Constant <0>
    [0] 0x213b610: i64 = undef
[0] 0x215eb80: v2f64 = scalar_to_vector 0x213aee0
  [0] 0x213aee0: f64,ch = load 0x213a720, 0x213ac00, 0x213b610 <0x215ace8:0>
alignment=8
    [0] 0x213a720: ch = EntryToken
    [0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
      [0] 0x213a720: ch = EntryToken
      [0] 0x213ad70: i64 = Register #1024
    [0] 0x213b610: i64 = undef

I added the brackets on output to show which result is being linked to, so the
prefetch takes the chain output from its controlling load.

When llvm.x86.sse2.min.sd gets selected, the TableGen-generated code
does this:

  SDNode *ResNode = CurDAG->SelectNodeTo(N.getNode(), Opc0, VT0, MVT::Other,
Ops0, 6);

0x215f140: v2f64,ch = <<Unknown Machine Node>> 0x21606d0, 0x213ac00,
0x215ffa0, 0x215ea10, 0x215ee60, 0x213a720
[0] 0x21606d0: v2f64 = scalar_to_vector 0x213b8f0
  [0] 0x213b8f0: f64,ch = load 0x213b780, 0x213aa90, 0x213b610 <0x2113690:0>
alignment=8
    [0] 0x213b780: ch = Prefetch 0x213aee0:1, 0x213b1c0, 0x213b330, 0x213b4a0
      [1] 0x213aee0: f64,ch = load 0x213a720, 0x213ac00, 0x213b610
<0x215ace8:0> alignment=8
        [0] 0x213a720: ch = EntryToken
        [0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
          [0] 0x213a720: ch = EntryToken
          [0] 0x213ad70: i64 = Register #1024
        [0] 0x213b610: i64 = undef
      [0] 0x213b1c0: i64 = add 0x213ac00, 0x213b050
        [0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
          [0] 0x213a720: ch = EntryToken
          [0] 0x213ad70: i64 = Register #1024
        [0] 0x213b050: i64 = Constant <136>
      [0] 0x213b330: i32 = Constant <0>
      [0] 0x213b4a0: i32 = Constant <3>
    [0] 0x213aa90: i64 = Constant <0>
    [0] 0x213b610: i64 = undef
[0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
  [0] 0x213a720: ch = EntryToken
  [0] 0x213ad70: i64 = Register #1024
[0] 0x215ffa0: i8 = TargetConstant <1>
[0] 0x215ea10: i64 = Register #0
[0] 0x215ee60: i32 = TargetConstant <0>
[0] 0x213a720: ch = EntryToken

Ok, so far so good.

Then it does this:

  ReplaceUses(SDValue(CPInChain2.getNode(), 1), SDValue(ResNode, 1));

CPInChain2 is this:

0x213aee0: f64,ch = load 0x213a720, 0x213ac00, 0x213b610 <0x215ace8:0>
alignment=8
[0] 0x213a720: ch = EntryToken
[0] 0x213ac00: i64,ch = CopyFromReg 0x213a720, 0x213ad70
  [0] 0x213a720: ch = EntryToken
  [0] 0x213ad70: i64 = Register #1024
[0] 0x213b610: i64 = undef

So it's replacing the load/memory chain output of the min intrinsic with the
chain output of the min intrinsic. Ok, that seems right. But one of the
users of that chain output is the prefetch. So now the prefetch will get a
chain input pointing to the chain output of the min intrinsic, creating a
cycle.

The fundamental issue is that the DAG originally looked like this:

MIN
  LOAD B
    PREFETCH
      Chain from LOAD A
  LOAD A

Now the chain will come from the MIN and we'll have:

MIN
  LOAD B
    PREFETCH
      Chain from MIN
  X86AddressOperands

Is such a thing legal? I wouldn't expect so since it's cycle. Eventually we
blow up in topological sort since it can't find one.

What's the right way to handle this?

                                              -Dave

Actually, it looked like this:

MIN
  LOAD B
    Chain from PREFETCH
      Chain from LOAD A
    NULL
  LOAD A
    Chain from ENTRY
    Some Addressing

LOAD B is from NULL because this is a bugpoint-reduced testcase, so that's not
an issue.

Just wanted to clarify in case someone was wondering about this.

                                                 -Dave

I suppose one solution would be to not have LOAD B carry the chain from the
prefetch. But SelectionDAGBuilder "automatically" gives the node it
constructs an input from the current root of the DAG. At some point the
prefetch is the root and the next thing to be built is LOAD B, so it gets an
operand pointing to the prefetch.

I'll look at this angle some more. It seems we should be able to break the
cycle here.

Let me know if I'm on the completely wrong track. :slight_smile:

                                               -Dave

Hi David,

I've run into a situation in isel where it seems like the selector is
generating a cycle in the DAG.

if you build with expensive checks on (configure with --enable-expensive-checks)
then I think some cycle checking stuff kicks in. This may help find the moment
when it was introduced.

Ciao,

Duncan.

I'm currently working in this area. What pattern is causing the cycle? Can I get a testcase?

-Chris

I'll see if I can generate one and file a PR.

                                        -Dave

Ah, isLegalToFold saves us on trunk. But we lose folding due to prefetching,
which is unfortunate.

I am seeing the error with 2.5 (yes, we are upgrading!).

I guess I'll have to backport some of the isLogalToFold logic.

                                              -Dave

Ok. isLegalToFold will go away and be replaced with an even better mechanism when my isel rewrite is done. :slight_smile:

-Chris

Hello, David

Ah, isLegalToFold saves us on trunk. But we lose folding due to prefetching,
which is unfortunate.

I am seeing the error with 2.5 (yes, we are upgrading!).

I guess I'll have to backport some of the isLogalToFold logic.

There was x86-only code at pre-2.6 times which was later moved into
generic hook named "isLegalAndProfitableToFold". You might want to
backport just that part.

I found the problem. It's a bug in 2.5:

/// SelectScalarSSELoad - Match a scalar SSE load. In particular, we want to
/// match a load whose top elements are either undef or zeros. The load
flavor
/// is derived from the type of N, which is either v4f32 or v2f64.
bool X86DAGToDAGISel::SelectScalarSSELoad(SDValue Op, SDValue Pred,
                                          SDValue N, SDValue &Base,
                                          SDValue &Scale, SDValue &Index,
                                          SDValue &Disp, SDValue &InChain,
                                          SDValue &OutChain) {
  if (N.getOpcode() == ISD::SCALAR_TO_VECTOR) {
    InChain = N.getOperand(0).getValue(1);
    if (ISD::isNON_EXTLoad(InChain.getNode()) &&
        InChain.getValue(0).hasOneUse() &&
        N.hasOneUse() &&
        // Cray: Bug 757517 (fixed in 2.7, possibly 2.6)
        IsLegalAndProfitableToFold(N.getOperand(0).getNode(), Pred.getNode(),
Op.getNode())) {

The call used to be:

IsLegalAndProfitableToFold(N.getNode(), Pred.getNode(), Op.getNode()))

so it was checking the wrong thing for legality.

Thanks for the help, everyone!

                                          -Dave