Node deletion during DAG Combination ?

Hi,

I’m trying to optimize the ‘extract_vector_elt’ for my SIMD microcontroller.

The idea is, during DAG combination, to merge load/extract sequence into an architecture specific node.

During Instruction Selection, this specific node will be target selected to an architecture specific instruction.

By ‘combination of DAG nodes’ I understand ‘replacing a set of DAG nodes by an (usually smaller) set of DAG nodes, removing dead nodes and updating all the chains’.

I’m using the setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT) and build the new LOAD_VECTOR_EXTRACT node in the PerformDAGCombine().

As per the following trace this work, t59 and params from t58 are merged into t72.

During this combination, the chain from t59 to t58 has been merged from new t72 to t57, skipping over t58.

So far, so good.

My problem is that this node t58 is not removed from DAG. One reason could be that node t62 is still chained to t58 (due to the fact the extract_vector_elt has no chain).

(I also have to investigate why the t57 and t58 are duplicated… This should be another problem).

I’ve take a look at other DAGCombine implementation for others architecture but I didn’t found an API to explicitly remove a node and trigger overall chain update.

Could someone point out an example of API for such node deletion during DAG Combination?

Regards, Dominique T.

t56: ch = store<Volatile ST8[%l2]> t54:1, t55, FrameIndex:i16<1>, undef:i16

t57: v2f32,ch = load<Volatile LD8[%l1]> t56, FrameIndex:i16<0>, undef:i16

t58: v2f32,ch = load<Volatile LD8[%l1]> t57:1, FrameIndex:i16<0>, undef:i16

t59: f32 = extract_vector_elt t58, Constant:i16<0>

t62: ch = llvm.clp.writeapb.f32 t58:1, TargetConstant:i16<397>, Constant:i32<24575>, t59

Combining: t59: f32 = extract_vector_elt t58, Constant:i16<0>

… into: t72: f32 = CLPISD::LOAD_VECTOR_EXTRACT_o t57:1, FrameIndex:i16<0>, Constant:i16<0>

Combining: t56: ch = store<Volatile ST8[%l2]> t54:1, t55, FrameIndex:i16<1>, undef:i16

… into: t73: ch = CLPISD::STORE_VECTOR_INSERT_oo t53:1, FrameIndex:i16<1>, t53, Constant:i16<1>

t73: ch = CLPISD::STORE_VECTOR_INSERT_oo t53:1, FrameIndex:i16<1>, t53, Constant:i16<1>

t57: v2f32,ch = load<Volatile LD8[%l1]> t73, FrameIndex:i16<0>, undef:i16

t58: v2f32,ch = load<Volatile LD8[%l1]> t57:1, FrameIndex:i16<0>, undef:i16

t72: f32 = CLPISD::LOAD_VECTOR_EXTRACT_o t57:1, FrameIndex:i16<0>, Constant:i16<0>

t62: ch = llvm.clp.writeapb.f32 t58:1, TargetConstant:i16<397>, Constant:i32<24575>, t72

t73: ch = MOVATO_B_oo t53, TargetFrameIndex:i16<1>, t53:1

t57: v2f32,ch = LOAD_AB_o TargetFrameIndex:i16<0>, t73

t58: v2f32,ch = LOAD_AB_o TargetFrameIndex:i16<0>, t57:1

t61: i32,ch = LOAD_A_i TargetConstant:i32<24575>, t58:1

t72: f32 = LOAD_A_o TargetFrameIndex:i16<0>, t57:1

t62: ch = WRITEAPB_A_oo t61, t72, t58:1

Hi Dominique,

The idea is, during DAG combination, to merge load/extract sequence into an architecture specific node.

Interesting. My first thought would be to do that during instruction
selection rather than as a combine.

My problem is that this node t58 is not removed from DAG. One reason could be that node t62 is still chained to t58 (due to the fact the extract_vector_elt has no chain).

Yep, you never need to actually remove nodes from the DAG yourself,
but if you're replacing one you have to make sure it's a real orphan
afterwards -- no uses accessible from the root of the DAG.

So in your case you need to make sure that not only all uses of the
extracted element are replaced, but all users of the chain attached to
the load.

Combining: t59: f32 = extract_vector_elt t58, Constant:i16<0>
... into: t72: f32 = CLPISD::LOAD_VECTOR_EXTRACT_o t57:1, FrameIndex:i16<0>, Constant:i16<0>

Here it looks like you're not creating your LOAD_VECTOR_EXTRACT to
integrate into the chain in the first place. Even ignoring the
replacement issue loads need to take and produce an extra chain
argument to enforce correct ordering.

So there are 3 steps:

1. Add the incoming chain of the load you highlighted as an operand to
this new combined node.
2. Create the combined node with an output chain too (essentially just
an extra MVT::Other output value).
3. Manually call SelectionDAG::ReplaceAllUsesWith to replace all uses
of the load node's chain with this new one because the generic
combining logic only knows about the extract being combined. It can't
automatically tell you've also fiddled about with a load further up
the DAG.

That'll leave the old load with no uses in the DAG and it'll
automatically be cleaned up later.

Cheers.

Tim.

Tim wrote:

From: Tim Northover [mailto:t.p.northover@gmail.com]
Sent: mercredi 20 juin 2018 11:13
To: Dominique Torette
Cc: LLVM Developers Mailing List
Subject: Re: [llvm-dev] Node deletion during DAG Combination ?

So there are 3 steps:

  1. Add the incoming chain of the load you highlighted as an operand to
    this new combined node.
  2. Create the combined node with an output chain too (essentially just
    an extra MVT::Other output value).

Steps 1 and 2 have been implemented. t72 has now Chain output.

t73: ch = CLPISD::STORE_VECTOR_INSERT_oo t53:1, FrameIndex:i16<1>, t53, Constant:i16<1>
t57: v2f32,ch = load<Volatile LD8[%l1]> t73, FrameIndex:i16<0>, undef:i16
t58: v2f32,ch = load<Volatile LD8[%l1]> t57:1, FrameIndex:i16<0>, undef:i16
t72: f32,ch = CLPISD::LOAD_VECTOR_EXTRACT_o t57:1, FrameIndex:i16<0>, Constant:i16<0>
t62: ch = llvm.clp.writeapb.f32 t58:1, TargetConstant:i16<397>, Constant:i32<24575>, t72

  1. Manually call SelectionDAG::ReplaceAllUsesWith to replace all uses
    of the load node’s chain with this new one because the generic
    combining logic only knows about the extract being combined. It can’t
    automatically tell you’ve also fiddled about with a load further up
    the DAG.

SelectionDAG::ReplaceAllUsesWith() was the API I was looking for…
But I have a problem with the 3 implementations of the SelectionDAG::ReplaceAllUsesWith().
They checks somehow the congruency of the returned values types….
t58 returns first a v2f32 while t72 return first a f32 ! This raises assertion. L
I understand such type constraint for general substitution, but in this case the only reference to t58 is the chain from t62.

Hi Dominique,

Steps 1 and 2 have been implemented. t72 has now Chain output.

Excellent. That looks right to me.

SelectionDAG::ReplaceAllUsesWith() was the API I was looking for…
But I have a problem with the 3 implementations of the SelectionDAG::ReplaceAllUsesWith().
They checks somehow the congruency of the returned values types….
t58 returns first a v2f32 while t72 return first a f32 ! This raises assertion.

Oh dear, I'm very sorry; I should have checked the documentation more
thoroughly. It looks like the function is actually
ReplaceAllUsesOfValueWith. For some reason even the versions of
ReplaceAllUsesWith that take SDValues try to handle the whole SDNode.

Cheers.

Tim.