I've been trying to understand why I do not get the expected output in some lit tests.
After some debugging I found out that DAG combines aren't always applied in topological order. Making it hard to predict the result, and I figure also making it hard to get a specific behavior.
Consider the example here: https://godbolt.org/z/x8PEhsnqf
The debug log show this:
.... cut ....
Legalized selection DAG: %bb.0 'test:'
SelectionDAG has 15 nodes:
t0: ch = EntryToken
t2: i32,ch = CopyFromReg t0, Register:i32 %0
t3: i16 = truncate t2
t5: i32,ch = CopyFromReg t0, Register:i32 %1
t6: i16 = truncate t5
t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>
t10: i16 = and t17, Constant:i16<-32>
t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10
t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i16 $ax, t13:1
Combining: t17: i16 = X86ISD::FSHL t3, t6, Constant:i8<6>
Combining: t15: i8 = Constant<6>
Combining: t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i16 $ax, t13:1
Combining: t13: ch,glue = CopyToReg t0, Register:i16 $ax, t10
Combining: t10: i16 = and t17, Constant:i16<-32>
.... cut ....
So when the DAG combiner starts to combine things after having legalized the fshl (in this case into using X86ISD::FSHL) it will do combine on X86ISD::FSHL before doing combine on the AND node that is using the result of the funnel shift.
In my target the fshl expanded to a pattern involving and OR operation, and I wanted to do trigger a fold based on the outer AND, but instead the OR is combined first, resulting in a pattern that is much harder to combine when eventually triggering the DAG combine on the AND.
I'm well aware that I probably need/want to implement that more complicated fold as well to be less depending on how the input to ISel looks like (and in which order nodes are legalized/combined), but the result here was a bit unexpected to me.
If I insert a call to
in the beginning of DAGCombiner::Run I get the result I want. And I also noticed up to 7% speedup in some downstream benchmarks.
However, I guess such a change might impact compile time slightly. And lots of upstream lit tests would need to have their test checks updated.
Is it well known that we do not sort the worklist at the beginning of the DAGCombiner::Run, or should this be considered as a bug?