DAGCombiner, chains and mergeConsecutiveStores?

Hi llvm-dev,

We got the following problem in our OOT target.

During DAG combining we have a DAG like this:

SelectionDAG has 16 nodes:
    t0: ch = EntryToken
    t4: i16 = add nuw GlobalAddress:i16<%str0* @g0> 0, Constant:i16<2>
  t7: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 1)`, align 1, !tbaa !1)> t0, Constant:i32<0>, t4, undef:i16
  t8: i16,ch = load<(load (s16) from `%str1** undef`)> t7, undef:i16, undef:i16
    t9: i16 = add nuw t8, Constant:i16<2>
  t10: i16,ch = load<(load (s16) from %ir.next1.i)> t7, t9, undef:i16
  t11: i32,ch = load<(load (s32) from %ir.val.i1, align 1, !tbaa !7)> t7, t10, undef:i16
      t17: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 0)`, align 1)> t11:1, Constant:i32<0>, GlobalAddress:i16<%str0* @g0> 0, undef:i16
    t19: ch = TokenFactor t17, t8:1, t10:1
  t15: ch,glue = CopyToReg t19, Register:i32 $a0_32, t11
  t16: ch = PHXISD::RETURN t15, Register:i32 $a0_32, t15:1

Then when visiting the "load (s32)" load a better chain is found and a few combines later we get this DAG instead:

SelectionDAG has 16 nodes:
  t0: ch = EntryToken
    t4: i16 = add nuw GlobalAddress:i16<%str0* @g0> 0, Constant:i16<2>
  t7: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 1)`, align 1, !tbaa !1)> t0, Constant:i32<0>, t4, undef:i16
  t8: i16,ch = load<(load (s16) from `%str1** undef`)> t7, undef:i16, undef:i16
    t9: i16 = add nuw t8, Constant:i16<2>
  t10: i16,ch = load<(load (s16) from %ir.next1.i)> t7, t9, undef:i16
      t22: ch = store<(store (s32) into `i32* getelementptr inbounds (%str0, %str0* @g0, i32 0, i32 0)`, align 1)> t20:1, Constant:i32<0>, GlobalAddress:i16<%str0* @g0> 0, undef:i16
    t25: ch = TokenFactor t8:1, t10:1, t22
  t15: ch,glue = CopyToReg t25, Register:i32 $a0_32, t20
  t20: i32,ch = load<(load (s32) from %ir.val.i1, align 1, !tbaa !7)> t0, t10, undef:i16
  t16: ch = PHXISD::RETURN t15, Register:i32 $a0_32, t15:1

Notice how the chain for the "load (s32)" now points at the entry token. I figure TBAA indicated that there was no alias with the store so the we did not need the chain dependency between that load and the t7 store.

However, there is still an ordering between the store and the load, since the t20 load use t10 as a pointer, and t10 has a chain dependency to t7.

My questions is:
Is it correct/legal to update the chain like that, or is the chain supposed to show the memory ordering (without the need to also analyze the mix of dependencies given both other operands and the chain)?

If the above is a correct transform, then I think we got a problem with mergeConsecutiveStores (that we started to see downstream after ⚙ D116895 Fix a missed opportunity to optimize consecutive stores.). When finding the merge candidates (getStoreMergeCandidates) only chains are followed, not considering other operand dependencies. And later when making sure we do not cause cycles in the DAG (in checkMergeStoreCandidatesForDependencies) it is assumed that chains already have been analyzed, so at that stage only other operands are analyzed and chains are skipped.

Given the example above we have:
  t22 depends on t20 (chain dep)
  t20 depends on t10 (non-chain dep)
  t10 depends on t7 (chain dep) and t9+t8 (via non-chain dep)
  t7 depends on t0 (chain dep)

Merging the t7 and t22 stores would result in a cycle. To figure that out I think one need to analyze both chain and non-chain dependency at the same time, not in two separate analyses.

The particular problem we see can be solved by letting checkMergeStoreCandidatesForDependencies also analyse the chain operands.
That of course depends on the legality of the chain rewrites that I asked questions about above (but if that transform isn't legal then I figure that checkMergeStoreCandidatesForDependencies would be kind of redundant).

/Björn

but if that transform isn’t legal then I figure that checkMergeStoreCandidatesForDependencies would be kind of redundant

I’m not certain that’s the case. Before ⚙ D118943 [DAGCombiner] Fix dependency analysis in checkMergeStoreCandidatesForDependencies, checkMergeStoreCandidatesForDependencies only
analyzed the data dependencies and getMergeStoreCandidates analyzed the chain dependencies, do we know that this wasn’t by design? I think the question of whether it’s legal to have mixed data and chain dependencies still stands and warrants answering, especially in light of the compile time regression caused by this patch ([X86][SelectionDAG] Large compile time regression · Issue #59800 · llvm/llvm-project · GitHub).