[RFC] Is it okay to run DCE during a conversion pass?

Hi,

TL;DR Is it okay to run DCE during a conversion pass, in particular the MemRefToLLVM one?

  • Context *

Nicolas (@nicolasvasilache) and I have been working on handling some of the MemRef lowering logic directly within the MemRef dialect using composable rewriter patterns. (The rationale is some of these operators can be tricky to lower, e.g., subview, expand_shape, and we wanted to centralize that code in one place, so that different lowering passes wouldn’t have to reimplement it.)

Thanks to these patterns, we can get rid of a lot of code in the MemRefToLLVM pass. (See a preview of what it can look like at https://github.com/llvm/llvm-project/commit/b8758baa1b68e58d017384ab5028e4cf26202cdf.)

The problem is these patterns don’t play nicely with the conversion ones. That is, if you run them with the conversion “driver” alongside the existing ones, it’s going to break. (They break the conversion driver, because they don’t necessarily follow the “replace-the-whole-op” pattern that is expected in the conversion framework, thus we end up with an assertion failure that the pattern needs to replace the whole op (see at the end of this post). E.g., some of our patterns rewrite only the uses of certain results of an op, not necessarily the whole operation.)

Since they are not technically conversion patterns, this is somewhat expected, and it makes sense to run them in isolation before the conversion patterns. This is where the problem arises!

  • Why do we want to run DCE during a conversion pass? *

When we use rewriter patterns with the regular applyPatternsAndFoldGreedily, we get a trivial DCE at the same time. This is something that we can’t control (should we be able to?). Therefore, it’s not really that we want to run DCE, but we have to at this point.

  • Questions *

How do people feel about having a DCE pass during MemRefToLLVM?
Generally speaking, are we okay with allowing DCE during conversions?

CC: @Mogball , @mehdi_amini , @River707, @ftynse

The assert in mlir/lib/Transforms/Utils/DialectConversion.cpp

  assert((replacedRoot() || updatedRootInPlace()) &&
         "expected pattern to replace the root operation");

I feel like I’m missing something somewhere: you said you’re running these patterns in isolation before the conversion patterns, so I assume that these don’t run as part of the conversion framework but with applyPatternsAndFoldGreedily, is this correct?
If so, then where is the need for DCE during dialect conversion coming from?

Right now, during dialect conversion you can mark an op for deletion rewriter.eraseOp(op) and the conversion framework will delete it at the end if it has no uses left.

The assertion you’re hitting indicates that a conversion pattern is indicating “success” but didn’t rewrite the operation.

I assume that these don’t run as part of the conversion framework but with applyPatternsAndFoldGreedily , is this correct?

That’s correct. But that call is within the MemRefToLLVMConversionPass pass (See llvm-project/MemRefToLLVM.cpp at b8758baa1b68e58d017384ab5028e4cf26202cdf · qcolombet/llvm-project · GitHub)

If so, then where is the need for DCE during dialect conversion coming from?

Technically, I don’t need (or want to be fair) to do DCE during the conversion, but because I call applyPatternsAndFoldGreedily from the conversion pass, it happens :).

Right now, during dialect conversion you can mark an op for deletion rewriter.eraseOp(op) and the conversion framework will delete it at the end if it has no uses left.

Thanks for the info. That doesn’t help in this case though.
Essentially, one of the rewrite pattern we want to use during the conversion does something like:

a, b, c = op

=>

a, <dead>, <dead> = op
b = new_op1
c = new_op2

So op is not dead, only some of its results are.

The assertion you’re hitting indicates that a conversion pattern is indicating “success” but didn’t rewrite the operation.

This rewrite pattern applied successfully. It just didn’t happen to replace the op completely. Which, unless I missed something, is completely fine for non-conversion patterns and should also return success in these cases.

To summarize, I’d like to run rewrite/canonicalization patterns before applying the lowering patterns, but when I do that I end up running DCE as a side effect.

So what I am asking is:

  • Is it okay to run DCE (as a side effect of applyPatternsAndFoldGreedily) in the MemRefToLLVMConversionPass?
  • Should we add a way to not run the DCE as part of applyPatternsAndFoldGreedily?
  • Should we be able to run any rewrite patterns as part of the conversion patterns (i.e., are partial updates okay)?
  • Is there another way to do what I’m trying to do?

Ideally I wanted to only add the simplification patterns I have to the conversion patterns there are in that pass, but that a doesn’t work.

Maybe to summarize the discussion from another angle, we had the following steps:

  1. rewrite patterns that lift memref metadata manipulation out of conversion to LLVM for better reuse
  2. one of those patterns is of type X → AnyOp and forwards results to all users, it is anchoring on X. Anchoring on AnyOp would be impractical/wasteful because all Operation* and all Operands would have to be matched against isa<X>.
  3. as a consequence 2. cannot be used as a pattern in the conversion infra which IIUC asserts the replacement of the op anchored on.
  4. so we’re using a preprocessing step that applyPatternsAndFoldGreedily with only the rewrite patterns, this results in quite more DCE (and tests to update) than currently.

I think this is fine and actually an improvement as we are significantly cleaning up the produced IR in the process but we were wondering whether someone sees a red flag in this reasoning?

There is always the option of anchoring 2 on AnyOp but it’s a tradeoff.

Thoughts?

It might be surprising for the user if the DCE’d ops are not related to either of the dialects participating in the conversion.

This sounds reasonable to me. I would also like DCE/CSE/cleanups to run on the ops produced by this particular application, but not the rest of the IR. However, when similar requests came up in the past, there was pushback against making applyPatternsAndFoldGreedily “too configurable”. Historically, the greedy driver is just the implementation of the canonicalization pass, but some clients started reusing it for other purposes. Maybe it is possible to factor out some of its implementation and have a different driver.

The dialect conversion driver, unlike the greedy one, applies patterns based on the notion of legality. If an op is considered legal, patterns rooted at it will not apply. If the op is considered illegal, they will. If the pattern hasn’t changed the root op, its legality won’t change. So applicable patterns will remain applicable “indefinitely”. In practice, the driver will check that the op remained illegal after pattern application, attempt to legalize it, and fail the entire process if the same pattern needs to be applied again. We can configure the pattern to support “recursive” application and bail out when it has nothing more to do, but it won’t change the legality of the root op.

You can probably hack around this by attaching an attribute to the relevant ops beforehand, have the pattern remove the attribute, and declare the op as being dynamically legal in absence of the attribute.

As a side note, be aware that many clients lower to the LLVM dialect by collecting all patterns in one bag and applying them in a single sweep of dialect conversion.

OK but in this case how is DCE helping since you still have op there?
You can also mark the op “modified in place” maybe?

OK but in this case how is DCE helping since you still have op there?

DCE is not relevant in this case, or in general for what it’s worth.
It’s just that DCE gets run as part of applyPatternsAndFoldGreedily. I.e., today, it is inevitable when you use this function.
What I’m asking is is it okay to have this DCE applied as a side effect :slight_smile: .

You can also mark the op “modified in place” maybe?

IIUC it was not possible in that case.
Here is the relevant snippet llvm-project/SimplifyExtractStridedMetadata.cpp at b8758baa1b68e58d017384ab5028e4cf26202cdf · qcolombet/llvm-project · GitHub

      rewriter.startRootUpdate(op);
      // updateRootInplace: lambda cannot capture structured bindings in C++17
      // yet.
      op->replaceUsesOfWith(result, constantVal);
      rewriter.finalizeRootUpdate(op);

Actually, thinking on this, I agree with @nicolasvasilache that running DCE is actually an improvement.

The relevant patterns live in a dedicated pass (see llvm-project/SimplifyExtractStridedMetadata.cpp at memref_to_llvm · qcolombet/llvm-project · GitHub), so IIUC, they wouldn’t be grabbed by default.

Anyhow, reading between what you and @mehdi_amini said, we would likely prefer a single sweep approach. So now the question is can we fix the problematic pattern to fit in the conversion framework?

(There’s potentially another issue if we don’t do DCE, but I’ll explain it only if we manage/decide to make all our rewrite patterns compatible with the conversion driver.)

Thanks for bringing this up, Quentin!

On the more philosophical side, a significant difference that I notice between MLIR and LLVM passes is that in MLIR we allow for more flexibility about what a pass/conversion can do and we are stricter about the expected input IR. I would say that this is kind of expected due to the different level of maturity of both projects. To some extent many MLIR passes tend to expect almost pristine optimized IR so we end up running canonicalizers/cse and other cleanup transformations very very often (at least that’s what I’ve seen in some pipelines), which the consequent impact in compile time. On the contrary, I have the impression that LLVM passes are more resilient to arbitrary IR. Of course, some of them do define constraints on the input IR (loops must be rotated, LCSSA, etc.) but not to the level I see in MLIR. Hopefully, we will move MLIR closer to LLVM in that regard. In this sense, I’m supportive of whatever approach brings more optionality to applying cleanup passes.

This is a long-running consideration whether to clean up IR upon creation or to emit whatever is the most natural and keep cleanups separate. So far, we have mostly erred on the side of separate cleanups with as rationale the desire to avoid duplicating the canonicalization logic into the internals of every pass.

I would agree that having a pass produce “more canonical” code is desirable. However, I would find it surprising if a pass started doing something with the IR it is not expected to touch. Loop rotation shouldn’t do SRoA in straight-line code, and memref-to-llvm shouldn’t touch things that are neither memref nor llvm IMO. It would be nice to be able to run some sort of “local DCE / canonicalization” on the IR produced or otherwise affected by the pass. This may require a new driver…

The patterns living in a dedicated pass is exactly the problem for the multi-sweep setup. If it does not or cannot grab them and the existing lowering is removed, it is no longer able to lower…

To fit these patterns in the conversion framework, we can rewrite them so that they are rooted at the op that is actually rewritten by the pattern. There is a mechanism for a pattern to consider any op as potential root that Nicolas mentioned above.

Alternatively, it may be possible to set up a complex dynamic legality for the op the pattern is currently rooted at that marks the op as illegal if it has users that need to be rewritten and illegal otherwise. Beyond duplicating the matching logic of the pattern in the legality checker, it will also have to grapple with the fact that dialect conversion doesn’t always erase the original ops immediately so they remain in the use list. I expect using updateRootInPlace with setOperand inside to remove the actual use, but that starts getting oddly specific in the pattern and therefore fragile.

There’s also the fact that MLIR operation set is not closed, so it is really difficult to create passes that are robust to input IR variations because they may contain absolutely random operations. This may also result in passes giving up a bit too quickly when trying to reason about IR.

Ok so we should look into making sure DCE doesn’t run as part of what we’re trying to do.

Side question, when you say “cannot grab them”, is it possible for something to fail to grab some patterns? (In the sense, is it possible that a call to populateXXXXPatterns to fail?) (Just curious.)

Just to double check, what you’re saying is regardless of where the patterns live, we should be able to get them by just calling into populateMemRefToLLVMConversionPatterns (in this specific case.)

If so, that makes a lot of sense and the multi-sweep is indeed problematic.

Okay, let me try to look into that.

Thanks!

There’s no failure path in these, so technically one cannot fail to just grab the patterns. However, taking regular RewritePatterns that were not carefully written for use with dialect conversion will crash the rewriter driver with a very high probability. Knowing that, if I had been setting up a dialect conversion call, I would consider myself to be unable to grab those patterns because they would break my flow.

What I am saying is that some (most?) downstreams do exactly that. For example, iree/ConvertToLLVM.cpp at 35f244ccb18435410cf5b5b998d08383301d84b4 · iree-org/iree · GitHub. It doesn’t mean they are doing it the right way, or that we cannot change the mechanism upstream. But it is nicer to think about a transition plan for those.