Shouldn't DAG combines be applied in topological order?

bjope · December 14, 2021, 5:56pm

I've been trying to understand why I do not get the expected output in some lit tests.

After some debugging I found out that DAG combines aren't always applied in topological order. Making it hard to predict the result, and I figure also making it hard to get a specific behavior.

Consider the example here: Compiler Explorer

The debug log show this:

.... cut ....

Legalized selection DAG: %bb.0 'test:'
SelectionDAG has 15 nodes:
  t0: ch = EntryToken
          t2: i32,ch = CopyFromReg t0, Register:i32 %0
        t3: i16 = truncate t2
          t5: i32,ch = CopyFromReg t0, Register:i32 %1
        t6: i16 = truncate t5
      t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>
    t10: i16 = and t17, Constant:i16<-32>
  t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10
  t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i16 $ax, t13:1

Combining: t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>

Combining: t15: i8 = Constant<6>

Combining: t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i16 $ax, t13:1

Combining: t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10

Combining: t10: i16 = and t17, Constant:i16<-32>

.... cut ....

So when the DAG combiner starts to combine things after having legalized the fshl (in this case into using X86ISD::FSHL) it will do combine on X86ISD::FSHL before doing combine on the AND node that is using the result of the funnel shift.

In my target the fshl expanded to a pattern involving and OR operation, and I wanted to do trigger a fold based on the outer AND, but instead the OR is combined first, resulting in a pattern that is much harder to combine when eventually triggering the DAG combine on the AND.

I'm well aware that I probably need/want to implement that more complicated fold as well to be less depending on how the input to ISel looks like (and in which order nodes are legalized/combined), but the result here was a bit unexpected to me.

If I insert a call to

DAG.AssignTopologicalOrder();

in the beginning of DAGCombiner::Run I get the result I want. And I also noticed up to 7% speedup in some downstream benchmarks.

However, I guess such a change might impact compile time slightly. And lots of upstream lit tests would need to have their test checks updated.

Is it well known that we do not sort the worklist at the beginning of the DAGCombiner::Run, or should this be considered as a bug?

Regards,
Björn

Topic		Replies	Views
[GlobalISel] Prioritizing long patterns in combiner over short ones LLVM Dev List Archives	7	254	January 22, 2021
Why are generic dag combines run before target dag combines? LLVM Dev List Archives	2	177	July 10, 2020
DAG optimization and lowering algorithm LLVM Dev List Archives	5	187	February 7, 2012
RFC: Legalizing operations and post-legalization DAG combining LLVM Dev List Archives	2	145	July 22, 2014
Possible missing DAGCombine opportunity? LLVM Dev List Archives	0	148	June 19, 2017

Shouldn't DAG combines be applied in topological order?

Related topics