The current state of dialect extensibility is kind of awkward and clunky, and the current system is not really scalable or sustainable in general. This RFC seeks to start addressing this by introducing general mechanisms to help solve some of the current problems that we are facing.
Background
There are various ways that our current mechanisms are unsustainable, but the main way that this manifests is due to the fact that we shove all dialect add-ons (such as interfaces) into a single dialect library. This results in a bloated dialect library with lots of conditionally necessary dependencies that result from trying to package everything together. A recent discussion on where to register external models is a good example of some of the problems and frustrations arising from this. Another simple in-tree example that can help illustrate this is the Inliner. The inliner transformation defines a dialect interface to be used to specify how to inline a dialect and when it’s legal. For most dialects this is innocuous, but the inliner support for the standard dialect may create cf
(ControlFlow) operations as part of its implementation. This means that the standard dialect needs to add a dependency on the ControlFlow dialect, as its inliner interface may end up creating ControlFlow operations. This may seem fairly simple, but this dependency is only necessary for the inlining transformation. If a user doesn’t need the inliner (which does happen mind you), they are still required to take on this additional dependency. If you apply this to other compilation flow specific transformations (such as bufferization), the point of unnecessary and untenable bloat becomes more quickly apparent.
Do we have anything that can currently be done to fix that?
Well, the current way of extending a dialect (or dialect owned construct, e.g. Attribute/Op/Type) is by adding a delayed interface to the DialectRegistry. This can handle simple cases, but it has several drawbacks: we can’t add conditional dialect dependencies, it only handles interfaces and each interface type requires a specific API, there is no control over when the interface gets added, etc. Another fundamental problem with our current system for extending a dialect is that we don’t have any indications on misconfiguration. One of the nice things about having everything in a single dialect library is that we know that the interfaces will always be loaded. We don’t have to worry about a user remembering to load the inliner support for the standard dialect, because we packaged it with the dialect itself.
Proposal
This proposal is split into two main parts:
DialectExtension
I propose we scrap the current delayed interface support from the DialectRegistry, and instead add a more general DialectExtension
construct. This construct is essentially a callback that is invoked when a dialect (or set of dialects) have been loaded. This greatly simplifies the API surface area on the registry, provides a more convenient grouping mechanism for related add-ons, and also opens the door for more powerful dialect extensions. For example, if we take the standard dialect inliner example from before, we could now express inliner support as an extension:
/// This extension is applied when the `StandardOpsDialect` is loaded.
void mlir::standard::registerInlinerExtension(DialectRegistry ®istry) {
registry.addExtension(+[](MLIRContext *ctx, StandardOpsDialect *dialect) {
dialect->addInterfaces<StdInlinerInterface>();
// The inliner extension relies on the ControlFlow dialect.
ctx->getOrLoadDialect<cf::ControlFlowDialect>();
});
}
The above is an example using a simple callback, but the underlying mechanism is a new DialectExtension
class:
template <typename DerivedT, typename... DialectsT>
class DialectExtension : public DialectExtensionBase {
public:
/// Applies this extension to the given context and set of required dialects.
virtual void apply(MLIRContext *context, DialectsT *...dialects) const = 0;
};
/// We can define the Inliner extension above as:
class InlinerExtension : public DialectExtension<InlinerExtension, StandardOpsDialect> {
public:
void apply(MLIRContext *context, StandardOpsDialect *dialect) const override {
dialect->addInterfaces<StdInlinerInterface>();
// The inliner extension relies on the ControlFlow dialect.
ctx->getOrLoadDialect<cf::ControlFlowDialect>();
};
};
void mlir::standard::registerInlinerExtension(DialectRegistry ®istry) {
registry.addExtensions<InlinerExtension>();
}
Another example of something that this could also open up for the future, is using extensions to register canonicalization patterns involving multiple dialects without those dialects needing to explicitly know about each other.
Promised Interfaces
Above introduced the concept of a DialectExtension, but what that infra doesn’t solve is one of the final points in the background section:
One of the nice things about having everything in a single dialect library is that we know that the
interfaces will always be loaded. We don't have to worry about a user remembering to load the inliner
support for the standard dialect, because we packaged it with the dialect itself.
If we exposed the Standard dialect inliner interface as an extension, we get the benefit of a more scalable dialect library/reduced dependencies/etc., but we don’t effectively guard the system against misconfiguration. To that end, I also propose that we introduce the concept of a “Promised” Interface. A “promised” interface is essentially an interface that a dialect (or its constructs, e.g. attributes/ops/types/etc.) asserts that it has an implementation for. What this boils down to is that a dialect will claim it supports an interface (which could be a DialectInterface/AttrInterface/OpInterface/TypeInterface/etc.) …
void StandardOpsDialect::initialize() {
...
declarePromisedInterface<DialectInlinerInterface>();
}
… with the expectation that the interface is loaded via an extension. If at any point the interface is properly attached, such as when applying an extension, the promise is resolved. If the interface is never attached and the interface is attempted to be used (e.g. via cast/isa/etc.) we can inform the user of the misconfiguration:
checking for an interface (`DialectInlinerInterface`) that was promised by dialect 'std' but never
implemented. This is generally an indication that the dialect extension implementing the interface was
never registered.
Final Thoughts
Our current system isn’t maintainable or scalable. Needing to shove every analysis and transformation dependency into the main dialect library is untenable, and often not possible (e.g. it could introduce circular dependencies). We need to start developing a system with which we can scale analysis and transformation additions to dialects in a clean and composable way. This does mean that more things will need to be registered, but the intention is to build out this infrastructure in such a way that it is harder to get wrong (e.g. users shouldn’t be left wondering why a transformation suddenly doesn’t work when an extension was not registered properly). As with any of the registration based infra, this is an ever evolving process and I don’t think we will start at the perfect end state.
I’ve uploaded two proof of concept patches at ⚙ D120367 [mlir] Refactor DialectRegistry delayed interface support into a general DialectExtension mechanism and ⚙ D120368 [mlir] Add support for "promised" interfaces
– River