Canonicalization cross region

When doing incremental lowering of some types I often need to rely on canonicalization to clean up the conversion. The problem is that those canonicalization get blocked at the region boundary.

Below is a concrete example when trying to fold a vector with a degenerated dimension. I rely on shape_cast canceling each others but the canonicalization would be blocked at the scf.for boundary.

    %21 = vector.shape_cast %20 : vector<4xf32> to vector<1x4xf32>
    %22 = scf.for %arg3 = %c0 to %c4096 step %c4 iter_args(%arg4 = %21) -> vector<1x4xf32> {
      %100 = vector.shape_cast %arg4 : vector<1x4xf32> to vector<4xf32>
      %109 = vector.shape_cast %108 : vector<4xf32> to vector<1x4xf32>
      scf.yield %109 : vector<1x4xf32>, vector<1x4xf32>, vector<1x4xf32>, vector<1x4xf32>
    %24 = vector.shape_cast %22 : vector<1x4xf32> to vector<4xf32>

I can add a simple canonicalization detecting this kind of case and change the scf.for. The questions are:

  • Where should this kind of canonicalization live? I assume it should be a scf canonicalization?
  • There are a large set of cases where instructions can cancel each other but it seems like they would all have to be handled explicitly and I don’t see a way to compose it with existing canonicalization. Any idea to make this more generic?

Have you seen the branch op interface? It is intended to describe and handles such situation so that you don’t hardcode it on scf.for

That makes sense. Looks pretty easy to use that to generalize to different types of regions. What I was also wondering is if there is a way to generalize the code handling instructions that cancel each other out. This can apply to a bunch of different instructions (shape_cast, bitcast, insert/extract, etc…) and the folding code already exists. But I doubt there is a better solution than just handling all those cases explicitly.

A trait was added just recently for folding such involution ; it isn’t obvious that it works with reshape though :slight_smile:
The folder won’t work across the region right now though, I don’t know if it could itself be extended to lookup the branch interface and do what you want here, it seems a bit complicated as it requires to rewire many pieces.

There is no general way to have existing canonicalization patterns transparently “look through” the “phi node” represented by the explicit region bbargs of scf.for.

@River707 FYI, this might be something we can solve systematically with the PDL matchers?

I tried to add something as a ForOp canonicalization. I need to add a new interface to avoid adding dependency to SCF dialect and to be able to handle different ops that can be folded in a generic way.
What I’m trying to do is to check if the Yield operand’s definition can be combined with the use of the corresponding argument.
If this is the case then those operations can be moved out of the loop and then be potentially canonicalized with operations outside of the loop.

Here is the draft:

@mehdi_amini, @River707, @_sean_silva any comment on this direction? If this seems like a possible solution I can clean up this code and send it for review.

This will work, but now we will have a duplicated universe of folding patterns, which is undesirable from a maintenance perspective.

I agree, I was trying to get something to work with the current Folding infrastructure but because the folding patterns directly walk the SSA chain it doesn’t seem possible with the current interface.

Overall the cases that can apply in this case are a subset of the folding cases so it could be factored in a way that the folding call would call into this code but we still need to handle every operation explicitly for this transformation to kick in.

If you think a tablegen matcher infrastructure will allow to solve those problems I would be happy to discuss it to figure out how I can help make this progress.

There might be a way to do this but not sure if everything here is kosher.

  1. Declare a base class for patterns that are to be applied within a region
template <typename DerivedTy>
class RegionOpRewriter : public OpRewitePattern<DerivedTy>  {
  RegionOpRewriter(MLIRContext *context, ArrayRef<Value> regionOperands,
                                 PatternBenefit benefit = 1) :
    OpRewritePattern(context, benefit), regionOperands(regionOperands) {}

  ArrayRef<Value> regionOperands;

The regionOperands is the operands to the parent op (say an scf.for).

  1. Then any rewrite needed for the region has access to the values passed in as operands. So when it sees a BlockArgument of the region it can use the value from the regionOperands to get the definition from outside the region. In general any transformation might violate the IsolatedFromAbove property of the parent operation, but that can be fixed up later.

  2. Apply all the patterns defined to the region of the parent operation.

  3. After pattern application, find all the SSA values used in the region but defined outside and change the operands to the parent op (as well as corresponding arguments) to make the operation IsolatedFromAbove. Also need to make sure any yield operation return values are fixed up as well.

I am not sure how much this would work for all operations with regions, but might work for scf.for . Again, just wondering if something like this would work or if it is too wonky/convoluted.