[RFC] Callback for type and attribute bytecode encodings

Recently there have been efforts in bringing the MLIR bytecode format to stability with features like dialect versioning so that client dialects could build on top it a stable serialization format.

However, most of the dialects upstream do not yet handle versioning, so any change would necessarily break any client serialization. Even in a world where all the dialect upstream handle their own versioning, there could be instances where this would not be enough to guarantee stability. For example, if a type or an attribute definition moves from an upstream dialect to another, this would break backward/forward compatibility with today’s bytecode implementation.

On one hand, this is not necessarily a problem for ops, since it is reasonable to expect that a serialization format will not use any operation defined in upstream dialects. On the other, this falls short on types and attributes, since it is very common that client dialects borrow their types and attributes definitions from upstream (most notably from the builtin dialect). In this scenario, a client dialect trying to build a stable serialization format on top of MLIR would be theoretically required to re-implement all its types and attributes to build a stable serialization.

While doable, it is definitely not the most convenient way forward: with growing adoption of the MLIR bytecode format, one would imagine multiple clients trying to achieve the same thing, redefining private types and attributes for the sake of fully managing the encoding.

What we would like to propose is the addition of a callback in the bytecode reader/writer that would allow a client to override reading/writing the encoding of any type/attribute. This would solve the scenario above, allowing a client dialect to use upstream types/attribute definitions and simply “borrow” the ownership of the type/attr class for the encoding. In addition, the client could use its own versioning scheme to guarantee backward/forward compatibility independently from upstream.

We prototyped a patch here:
https://reviews.llvm.org/D153383

Alternatives to this approach are possible - for example, one could make optional the initialization of the bytecode dialect interface, and expose a way for a client to provide such interface on any dialect. This would be less intrusive in the bytecode reader/writer, but achieving the feature where a client dialect can use its own versioning for the encoding of types and attributes borrowed from upstream seems a bit cumbersome to achieve.

Looking forward to hear any comments or suggestions.

Best regards,
Matteo