Proposal: Pass Manager Builder for MLIR

Disclaimer: Most of these ideas are still fresh in my mind, I’m not claiming this is the way. If anyone has a better idea, I’m all ears.

The need for an outer layer to the pass manager

Both LLVM and MLIR pass managers are reasonably similar in intent. They both allow you to add analysis and transformation passes, they both have a pass instrumentation with runBefore and runAfter methods, they both invalidate analysis and chain the passes together, either via some addPasses function or command line arguments to an opt-like tool.

LLVM’s new pass manager has a more elaborate, dynamic pipeline, where the runBefore method can skip a pass, and where a set of future dependencies are intersected after each pass. MLIR, on the other hand, is more static (runBefore can’t skip anything) but it has an explicit verification before runAfter steps.

LLVM’s pipelines are usually very monolithic. Compilers are given simple option bundles (like O3 or Oz) and the complexity is then inside the pipeline itself, which takes into account target preferences and some user choices, etc.

MLIR pipelines are quite different beasts. Compilers can use them in a similar way as LLVM, but the construction of the pipeline becomes more fragmented when radically different decisions are taken at the high level, for example, to target CPUs or GPUs or accelerators, different dialects being present or not, etc. Pipeline builder classes end up being more complex and brittle to changes.

Moreover, due to the nature of MLIR, combining dialects, injecting alternative passes, lowering and side-conversions are not only allowed, but encouraged. In the end, every framework or compiler would need to account for every alternative or experiment in order to make use of their dialects and passes. This obviously doesn’t scale.

Pass Manager Builder

Much like LLVM’s alternative, this is something that helps one build a pipeline using pass managers, but unlike LLVM’s, the MLIR version doesn’t need all the complexity of dependencies, intersections, etc. At least not at a first approach, so it’s possible one day we’ll get there.

So, to be clear, this is NOT a proposal for a “New Pass Manager for MLIR”, it’s just for adding a builder pattern to it, and some relaxations to the pipeline to match.

Pass Bundles

The first feature I’d like to add, which is similar to LLVM’s builder, is pass bundles. Methods that add a group of passes that usually go together. For example, it’s common to pass canonicalization and CSE after heavier transform passes, so a addCleanupPasses() method would add them both.

But also, and perhaps more importantly, there are passes that should always run together, either to avoid canonicalizations in between, or because of dialect or shape guarantees (without having to pass information through metadata), or to avoid bufferization in between, etc.

We could even have a class PassBundle that represents a natural sequence of passes that always come together, if necessary. But once added to the pass manager, it doesn’t necessarily have to continue as a block. Or more, we could common analysis/cleanup passes between two consecutive bundles, if neither of them invalidate each other, for example.

Pass bundles would solve one problem: it’d make composition more natural by not making it a matter for the user which passes do add in which order, but the API.

Weak/Strong Order

By construction, if a pass doesn’t find its intended target match, it exists successfully. But if it didn’t find it because it was before/after another pass (classic example: bufferization), then it’s an error.

If passes “knew” they had to pass before/after certain other passes, then we could emit an error at the pipeline construction time. This is similar to what LLVM does, but much simpler, so we may get away with a simpler model, too.

Strong ordering would be: a memref pass should run after “at least one” bufferization pass. Passes could register with some properties to be checked, etc.

Weak ordering would be: a transform pass should be cleaned up afterwards. If there’s a clean up stage after, not necessarily immediately after, etc etc.

I’m being intentionally vague on the “rules”, this is all up for discussion and I don’t have strong opinions here. I’d just aim for performance, so avoiding complex checks and data structures.

Insert Before / After X

To allow third-party tools to interoperate with multiple frameworks (and vice-versa, N:M), it would be mandatory to insert external passes in the middle of the pipeline. Very rarely you need things in the beginning/end, and if you do, it’s easy to just “pipe” the results through.

But often you don’t need just one bundle here or there, you need to add some tensor passes, other memref passes and finally some lowering passes. The only way to do that, today, is to hijack the framework pipeline builder and inject calls to addMyPass in the right places (read: downstream patch), for each pair of tools. This doesn’t scale.

With “properties”, or by name, we could have simple methods like:

// Adds `NewPass` before an existing `OldPass` in the pipeline
// Error if `OldPass` doesn't exist, or there are conflicts in dependencies
template <class NewPass, class OldPass>
Error PassManagerBuilder::addBeforePass(PassManager&);

// Adds `NewPass` after an existing pass with `Pass~Type` property (ex. bufferization)
// Error if `PassType` doesn't exist, or there are conflicts in dependencies
template <class NewPass, class PassType>
Error PassManagerBuilder::addAfterPassType(PassManager&);

Variable Pipeline Order

But for that to work, the pipeline needs to be flexible. It needs to be more of a linked-list than a vector.

Adding a new pass before another could cause new cleanups to be inserted, for instance if the one after needs canonicalization to happen before it runs. Those checks are done at insertion time and once built, they still run linearly like the existing pass manager run method.

Any optimization of the pipeline (removing duplicated analysis/cleanup passes) needs to be done in this variable pipeline before giving it back to the pass manager, which will treat it as just a list.

In a nutshell, the order is only variable during construction time. During execution time it’s exactly the same and can continue to be driven by the PassManager as is.

Integration

To integrate across tools, the builder can be passed along to helper functions, back and forth, until the pipeline is complete, then the original framework calls the main run method, and the PassManager takes over.

For example, a top-level framework would create its own pipeline like:

// Construct infrastructure
PassManager pm ...;
PassManagerBuilder pmb(pm);
...

// Ingress
if (ingressFormat == MHLO)
  pmb.addMHLOIngressPasses();
...

// etc. until LLVM lowering
pmb.addLLVMLoweringPasses();

Then, at run time, the framework could check if some external helpers are available, for example, our tpp-mlir project.

If the project is available (dynamic libraries, runtime options, etc), it’d then delegate to the project’s API to insert its own passes.

// Very silly API here, just to show intent
auto* tpp_mlir = checkIfTPPIsAvailable();
if (tpp_mlir)
  return tpp_mlir->registerDialectsAndPasses(pmb);

And the registerDialectAndPasses would be framework agnostic:

Error registerDialectAndPasses(PassManagerBuilder& pmb) {
  // Register all necessary dialects
  pmb.registerDialect<MyDialect>();

  // Tensor passes (we only support one-shot bufferization)
  if (failure(pmb.addPassBefore<MyTensorPass, OneShotBufferize>()))
    return failure();

  // Memref passes (we only support one-shot bufferization)
  if (failure(pmb.addPassAfter<MyMemrefPass, OneShotBufferize>()))
    return failure();

  // Before any LLVM lowering passes (which can come in any order)
  if (failure(pmb.addPassBeforeType<MyMemrefPass, LLVMLoweringPassType>()))
    return failure();

  return success();
}

In that example, if the framework we integrate into doesn’t use one-shot bufferization, the first call would return failure, and we’d have to create a different integration logic, which then would be some variation for pipeline types, not necessarily each framework under the sun.

Conclusion

We don’t want to have a “New Pass Manager”, but we do want a more flexible way of integrating passes into existing pipelines, and that can be done with a wrapper-builder-pattern if we add a bit of structure to the passes.

That builder pattern can also help build traditional pipelines, even without third-party integration, and make it easier to express dependencies, cleanups and target choices. All that while still inter-operating natively with the existing PassManager, so no changes needed, just additions.

While the code above is not representative of what must be, it shows the idea that I’m trying to convey. There can very well be parts of that idea that are already implemented or slightly different, and I’m happy to reuse as much as possible.

But the bottom line is that we really need to be able to add passes in the middle of an existing pipeline with minimal amount of dependency checking and downstream patching.

If you have better ideas, I’m all ears.

@stellaraccident @nicolasvasilache @jpienaar @mehdi_amini @sanjoyd

1 Like

It’s great to see that some folks are thinking about this, thanks for starting this discussion!
This is also a signal that the ecosystem is getting to more maturity and we’re hitting this kind of scaling problem that show the current limits of the infra :slight_smile:

This seems similar to our “pass pipeline” registration: Pass Infrastructure - MLIR ; can you elaborate a bit on how different they are?
(I think “bundles” is a better name that “pipeline” for this by the way!)

An alternative to “pass pipeline” is also to have a “wrapper pass” that runs the “bundle” as a dynamic pipeline (Pass Infrastructure - MLIR).

This may also just be the immaturity of the various projects: the LLVM pass manager builder didn’t always have all the hooks it has now to inject passes at various places.
You could imagine a project like IREE/XLA to expose similar set of hook in their interface to inject passes at predefined “stages” of their pipeline.
(this isn’t a rebuttal of your entire proposal of course, it stays relevant and interesting regardless)

My main immediate concerns with using pass names or ID is that it leaks what I see as an internal detail of the pipeline (the exact pass) and creates coupling between the pipeline definition a project and the external project trying to extend the pipeline / hook into it. Contrast this with the LLVM PMB which has well defined extension point tied to the overall structure of the pipeline instead.

There are also questions of “ordering”: what if you do pmb.addPassBefore<MyTensorPass, Canonicalize>() and then add another Canonicalize to the pipeline?

1 Like

By the way, I think that that you can implement this PassManagerBuilder as a standalone class already: it should layer easily on top of the pass manager and makes it for an easy candidate to build a proof of concept for this!

Of all ideas, this seemed the easiest path forward. But I needed to make sure this was something that people wanted before spending time on it.

Thanks! Your reply was very much on point on the things I was thinking. Just want to clarify some points below, but that was pretty much the points I was trying to make.

IIUC, the registration allows you to add passes “at the current insertion point” (ie. end()), not at an arbitrary insertion point, which is the key difference I’m trying to bring.

The reason I want it that way is to simplify building the pipeline across multiple (optional) builders. It’s much easier to give the builder to a (variadic) number of extensions, at the end, and let them pepper your pipeline, than calling each extension’s method X at level Y before proceeding building the pipeline.

A fully generic pipeline with a variadic number of extension builders would have to call registerMyPassesAtLevelX() for each extension, for each level, to get the same functionality.

IIUC, dynamic pipelines require the framework to know that it’s possible to have a variadic number of extensions at each level, too, and insert passes based on some checks. It’s basically the same problem as above, but at runtime.

Stratifying the pipeline may be interesting for other purposes, too, and this may fit well with this proposal. But it’s not without peril.

It would mean we’d have to agree, across all users, what are the reasonable “stages”, which isn’t necessarily a converging strategy. It’d also make for unpredictable results if a framework inserts new passes “before the end” (say, vectorization), where my passes would pick up (that don’t like vectorization).

I’d rather propose that with (other) stronger backing reasons, or we’ll put the weight of the decision (and the blunt of the work) on extension builders.

Precisely! This is why I also suggested something like a PassType. We could only have those, tbh.

Imagine the following (pretend) types:

  • Ingress: Reads front-end dialects and cleans up a bit
  • Normalization: Converts front-end dialects to generic input dialect (ex. linalg on tensor)
  • EarlyOptimization: Do high-level tensor transformations
  • Bufferization: Convert tensors to memrefs (can be more than one pass, can be scattered)
  • LateOptimization: Do low-level memref transformation
  • HardwareOptimization: Converts memrefs to vectors/hw dialects (very low level)
  • Lowering: Lowers remaining dialects to LLVM/SPIRV/etc
  • Cleanup: Runs canonicalization, CSE, etc in between passes / bundles

Instead of those being “stages”, they’d be “pass types”. Frameworks could intermix Ingress passes with Canonicalization and Normalization passes as necessary, but there’s a time when the first and last pass of each type is executed, which may intersect with other pass types.

In that way, “types” are intersecting intervals instead of defined blocks. If I need my pass to run after the last X or before the first Y, this helps not forcing frameworks to artificially force their pipelines into blocks and still not guaranteeing new passes won’t break the expectations.

To force a PassType to every pass, we could have it as a mandatory field in the pass object, filling up for all upstream passes, default UnknownType and allowing downstream users to change when they want, even adding their own types.

Right, this is bound to happen with Canonicalize. My gut feeling is that “this does not make sense”, because canonicalize isn’t a “reasonable target”. But of course, that’s subjective.

Objectively, we can define the rule that both insertBefore and insertAfter will stop at the first occurrence, but the latter will search backwards. If you call something that has multiple copies, “you get what you asked for” ™. And if extensions pick targets by name, then “they get what they asked for” ™.

But you are right that this is one of the frail points of type “intervals” versus stage “blocks”. Honestly, I’m happy either way, I just don’t want to shoehorn the same sequence of blocks to all MLIR users without knowing that it works for most/all cases.

1 Like