PSA: MLIR C/Python API Registration Overhaul

Hi there - if you don’t use/integrate MLIR’s C or Python API outside of llvm-project, you can stop reading.

If you do, ⚙ D128593 [mlir] Overhaul C/Python registration APIs to properly scope registration/loading activities. will be landing shortly. This makes it possible for downstreams to selectively depend on only the parts of the MLIR components that they need (there used to be an implicit dependency on “everything” which was present from the first commit, and this undoes that).

The patch has substantial notes on what has changed. It is likely that your changes to your project are far-far smaller than the notes (see the changes to the Standalone project that restores it to status-quo antebellum where everything gets registered). It is probably equally easy for you to make some changes to your project in one step so that it does not depend on everything. Here is an example where IREE was reworked to only depend on what it needs: DRAFT: Adapt IREE to MLIR Python API registration overhaul. by stellaraccident · Pull Request #9638 · iree-org/iree · GitHub

Thanks for your patience: I made this as unobtrusive as possible while also correcting this longstanding problem. There are a small set of projects that depend on this stuff and we often coordinate, so I will be monitoring this thread or discord next week if anyone needs help. (please at me via stellaraccident on discord). I tried to patch the bazel files so that at least mlir/... builds but there are fragile things in there and no upstream tests of merit, so… idk. Hopefully it is good but may need some additional minor changes. Refer to the patch notes for what was done to CMake for reference.

Patch notes below, for easy access:

Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit.

The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with RegisterEverything.h and mlir._mlir_libs._mlirRegisterEverything being the one-stop entry points for the “upstream packages”. The _mlir_libs.__init__.py now allows customization of the environment and Context by adding “initialization modules” to the _mlir_libs package. If present, _mlirRegisterEverything is treated as such a module. Others can be added by downstreams by adding a _site_initialize_{i}.py module, where ‘{i}’ is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can:

  • Perform load time customization to the global environment (i.e. registering passes, hooks, etc).
  • Define a register_dialects(registry: DialectRegistry) function that can extend the DialectRegistry that will be used to bootstrap the Context.
  • Define a context_init_hook(context: Context) function that will be added to a list of callbacks which will be invoked after dialect registration during Context initialization.

Note that the MLIRPythonExtension.RegisterEverything is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to add_mlir_python_common_capi_library and add_mlir_python_modules). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost.

CMake changes:

  • MLIRCAPIRegistrationMLIRCAPIRegisterEverything (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target)
  • MLIRPythonSoure.Passes removed (without replacement: just drop)
  • MLIRPythonExtension.AllPassesRegistration removed (without replacement: just drop)
  • MLIRPythonExtension.Conversions removed (without replacement: just drop)
  • MLIRPythonExtension.Transforms removed (without replacement: just drop)

Header changes:

  • mlir-c/Registration.h is deleted. Dialect registration functionality is now in IR.h. Registration of upstream features are in mlir-c/RegisterEverything.h. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R.

Python APIs removed:

  • mlir.transforms and mlir.conversions (previously only had an init.py which indirectly triggered mlirRegisterTransformsPasses() and mlirRegisterConversionPasses() respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization).
  • mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used).

C-APIs changed:

  • mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove – it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call mlirContextLoadAllAvailableDialects(MlirContext) to emulate the prior behavior. Also see the ir.c test case (e.g. mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));).
  • mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so).

C-APIs added:

  • mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose.

Python APIs added:

  • mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry.
  • mlir.ir.Context.append_dialect_registry(MlirDialectRegistry)
  • mlir.ir.Context.load_all_available_dialects()
  • mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a register_dialects(MlirDialectRegistry) entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the init.py and use it to monkey patch the Context to emulate prior behavior.
  • Type caster and capsule support for MlirDialectRegistry

This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: [mlir python] Dis-aggregating MLIRPythonExtension.Core · Issue #56037 · llvm/llvm-project · GitHub

Here is an example PR, minimally adapting IREE to these changes: DRAFT: Adapt IREE to MLIR Python API registration overhaul. by stellaraccident · Pull Request #9638 · iree-org/iree · GitHub In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add MLIRPythonExtension.RegisterEverything to the list of Python sources getting built, and the old behavior will continue (see the standalone example python/CMakeLists.txt in this patch, which does exactly this for now).

3 Likes