[RFC] Revamp dialect registration

While most mlir-opt-like testing tools will likely want to be able to operate on every possible passes and dialect, a production compiler would try to minimize the binary size and the number of dialects loaded in the context to reduce the memory footprint as well as some runtime aspects (number of canonicalization patterns to go through, etc.).
As MLIR is growing (we maintain almost 60 dialects out-of-tree inside Google), we also hit the limit in maintenance of MLIR-based tooling in terms of registration and dependency management.
I took some time to revisit how we’re handling it.

At the moment there is a global registry of Dialects, and factory function are registered globally there. When constructing a MLIRContext, all the globally registered dialects are automatically loaded in the context.
This is causing some problems as the client creating the MLIRContext has to ensure that before creating it all possible dialects that will be needed are in the global registry.
The global registry encourage a pattern of using dynamic global constructors to register dialects, and force-linking these objects in the final binaries. This complicates the build system configuration and often breaks the modularity of the build: for simplicity these global constructors are always linked-in, pulling inside the binary all the code for the associated dialect, and worse: we end up with loading in the Context many more dialects than needed, increasing the startup time and the memory footprint.

As an example, tf-opt (TensorFlow flavor of mlir-opt) always register and load in the Context these dialects:

affine, avx512, chlo, coo, corert, data, gpu, linalg, llvm, llvm_avx512, lmhlo, mhlo, nvvm, omp, quant, rocdl, scf, sdbm, shape, spv, std, tf, tf_device, tf_executor, tf_saved_model, tfd, tfjs, tfl, tfrt, tfrt_dht, tfrt_fallback, tpurt, ts, vector

The direction I’m taking right now is to move away from relying on the global registry as much as possible and not load any dialects in the MLIRContext on construction.

Instead Dialects will be loaded in the context under three mechanisms:

  1. Explicitly:
  mlir::MLIRContext context;
  // Load our Dialect in this MLIR Context.
  context.getOrLoadDialect<mlir::toy::ToyDialect>();

A compiler like the Toy compiler needs to load the ToyDialect before the frontend emits operations from this dialect. However we don’t want the compiler to explicitly list all the possible dialects involved in the optimizer. Hence the second mechanism below.

  1. As pass dependency: the other way Dialects are needed after the frontend emits the IR is because transformations are applies to the IR. As such it is natural that the individual passes declare the dialects they intend to produce: a Pass consuming Toy and producing a mix of Linalg and Affine would declare that it depends on the Affine and Linalg dialects (no need to declare Toy as it is expected in the input).
    The PassManager, when starting the processing of a PassPipeline collects the required dialects from the list of passes in the pipeline and load them in the context.

  2. Lazily loading during parsing: production compilers (Flang, TensorFlow, etc.) don’t need to parse arbitrary MLIR and should only rely on the two mechanisms above. For mlir-opt tools and similar, we still need to load in the context all the dialects we expect to process. For now we will still use a global registry pending more refactoring.
    This registry however won’t trigger the load of all dialects in the context ahead of time. Instead the Parser is taught to load dialects from the registry lazily as it encounter Operations/Types/Attributes from an unknown dialect. This setup is intentionally designed this way to allow testing for the pass dependency mechanism described above. When using mlir-opt and running an individual pass, only the dialects present in the input file are loaded in the context, if a Pass creates operations in a dialect not present in the input it’ll fail, which will help ensuring that the pass dependency is correct.

This is all implemented in a prototype revision here: https://reviews.llvm.org/D85622

1 Like

This is a great approach Mehdi, and a nice problem to have :slight_smile:

-Chris

I just noticed that there needs to be one more hook for loading dialects besides the 3 you mentioned. We will need a Dialect::getDependentDialects hook with a contract something like “load all dialects that might be needed by any dialect interfaces on this dialect, or any interfaces on any Operations, Attributes, or Types from this dialect, or any canonicalization/folding”. When dialect Foo is loaded, any dialects in Foo::getDependentDialects are also loaded.

Examples:

  • interfaces: can create ops from different dialects. E.g. the implementation of InferShapedTypeOpInterface will create multiple ops from the std dialect. I don’t see how a shape inference pass can getDependentDialects accurately. (and I don’t think there is anything special about the std dialect here; it might create shape dialect ops just as easily or any other dialect)
  • verifiers/folding/canonicalization: can do something like type.getElementType().isa<SomeType>() where SomeType is from another dialect. For example, you might have a type holds either a IntegerType or a quantized type, and the verifier needs to check different things in both cases.

Actually, it looks like this can be handled by calling context->getOrLoadDialect<DependedOnDialect>() in the constructor.

Yeah this is handled now by explicitly calling them in the constructor. The Tablegen class for Dialect has a field specifically for this that is used to generate the calls in the constructor, so I think it’s a known use case. Having child methods called in the constructor is tricky (I think you’d need to use CRTP), which is probably why this hook isn’t already there, but I do think it would make things a bit more ergonomic to have such a hook on both Dialect and DialectInterface.

Oh, sweet. Didn’t see that! Sorry for the noise!

Some updates:

  • This is documented in the FAQ as Registered, loaded, dependent: what’s up with Dialects management?, and in a small section of the Pass Management doc.
  • Upstream LLVM/MLIR is clean from any dependency on the global registry.
  • The registry is disabled by default now, if you are downstream and still in process of migrating, you can re-enable the global registry (but we’d like to remove all this “soon” so please don’t continue to depend on it).
  • As part of this work: OpPassManager does not depend on MLIRContext anymore, which allow to build a pass pipeline ahead of time.

Next step is to delete entirely the registry and all the global registration facilities for dialects in the coming month: please upgrade your project.

I’ve been running into alot of fallout from this because we use the PassWrapper pretty heavily. In transitioning, it wasn’t clear that the generated override of Pass::getDependentDialects() wasn’t being picked up. Maybe this should be made pure virtual to ensure that Passes generate a non-empty version?

What is the relationship between using the PassWrapper and the generated override of Pass::getDependentDialects() ? Are you defining your passes in TableGen but not actually using the generated code?
(I am not sure I quite got what you mean)

Most passes don’t need it, I don’t think we should force it on them.

They are separable problems, I think. Probably I made a mistake by trying to transition to using TableGen’d declarations at the same time as I was also trying to fix the DialectRegistry changes.

Doesn’t every pass have dialects that it depends on?

Only passes that introduce dialects not present in the input have, for example -linalg-to-scf only declares a dependency on SCF.
(see also the FAQ entry: FAQ - MLIR )

It’s been over a month, I’ll move on with removing this.

Pushed in Remove global dialect registration · llvm/llvm-project@b22e2e4 · GitHub