LLVM X86 backend combineIncDecVector's transform

Hi all,

As you knwo already, I’m trying to change DAGCombiner so that it process the nodes in topological order. Doing so is not difficult per se, but this creates various improvements and regression to the existing test suite. I’d like to work through as many of the regressions as possible ahead of time.

One source of such regressions is combineIncDecVector in the X86 backend. It changes (add X, 1) into (sub X, -1) in order to be able to use the pcmpeq instruction.

This is all well and good, but numerous paterns are matching an add rather than a sub, and in fact, DAGCombiner does the inverse transform by itself as it consider the add form to be canonical. An example of such pattern is the X86ISD::AVG node, but there are more.

It seems to me like this transformation is useful, but doesn’t happen at the right place in the pipeline. Doing so later on, for instance at the DAG to DAG level would likely give DAGCombiner more opportunities to do its job, and also ensure that all instances of the pattern are detected.

It would be great if someone familiar with the X86 backend could look into this.

Thanks in advance,

Amaury Séchet

I think DAGToDAG is too late because the build_vector has already been turned into a constant pool load by then so it’s a little difficult to get back. Maybe we can delay it to !DCI.isBeforeLegalizeOps()? That would at least let the first DAG combine and the post type legalization DAG combine see the add, 1.

+Sanjay as well

I have previously posted these two patches:

[X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to

[DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x,
-c) vector edition.

While they got stuck since i wasn't really interested in them
(i'm mostly interested in scalars, not vectors...),
i don't think there was any fatal roadblocks there,
so i guess i should rebase them and see what's the status now.


No objections from me to make it run later. I didn’t see the potential conflicts when I added that code. Delayed combine, custom lowering, or DAGToDAGISel all seem like viable options to me.

I pushed it to a later DAG combine in r369981. Let me know if you still have problems from it.

That is a great first step. It makes things better indeed, but still causes a few regression when nodes processing happens in a different order.

Roman patches seems to be the way to go.

Continuing going through the regression, another big source are loads, folded as (anyext/zext/sext/bitcast (load x)) → (load any/zext/sext/from x)

Numerous patterns then try to match some ext or bitcast, end up finding a load and bail. This is really pervasive and would likely require its own solution. Any ideas how to proceed?


Amaury Séchet