[RFC] IR Versioning

Right now there isn’t any builtin way for dialect to manage evolution of the IR (adding/renaming attributes, etc.).
Motivated by LLVM scheme, which supports loading bitcode produced by older version of the compiler by upgrading the IR as it loads it or after loading completes (but before running the verifier), I propose the same scheme for MLIR.

A dialect opts in having a version by implementing the getDialectVersion method of the OpAsmDialectInterface which returns an Attribute representing the version.There is no restriction in how a dialect represents its version scheme, the framework will not introspect the attribute.

A new optional top-level directive is introduced: dialect_versions. This directive is expected as the very first keyword in the input IR, and is followed by a dictionary attribute. They keys are dialect names, and the values are the dialect versions corresponding to the IR at the time it is produced.

The version is available through the AsmParser during parsing, which allows to write custom parsers that are backward compatible. I expect though that the most common case will be to use the generic IR format and a new hook upgradeFromVersion() defined on OpAsmDialectInterface. This hook is invoked right after parsing on every loaded dialect that has a version defined, and allows a dialect to inspect the producer version and traverse the IR to upgrade it to the current version.

⚙ D117761 Implement IR versioning through post-parsing upgrade through OpAsmDialectInterface implements the proposal and provides example on the Test dialect.

1 Like

So every custom attribute parser could query the version? (I think that is probably true even in generic IR case as there isn’t a generic format for these).

Can you expand on this? Today most folks serialize using the pretty format, to be able to parse at all one needs the AsmParser to understand the different versions. Are you proposing that generic IR format be such that the parsing is not version aware and the upgrading only needs to be? So basically does this versioning work if the generic IR textual format changes and is the expectation that folks mostly use it for any versioned serializing needs?

Perhaps I’m missing an implicit statement: here is a way we can support versioning as long as folks make backwards compatible changes to their textual IR representations, and the generic IR will be changed in backwards compatible ways only. (Which is a less strong claim than stable but a very interesting one still :slight_smile: )

This is a big difference with this one isn’t it?

Is it possible to also have a printer hook that lets one populate the requested producer version here? The strictly-upgradeable development mode is certainly the easiest to maintain and the one I expect most dialects who need some stability to opt in to. However, I could also see dialects deciding that they serve a role (and are willing to pay the cost) of being able to target a window of older versions. I would like such dialects to have the choice to implement more flexibility if desired. (Completely +1 on doing this incrementally and solving the parser issue first but would like to choose an approach that can generalize)

I know that we sometimes use generic form for serialization between tools because, in practice, it is less variable today vs the custom form. However, there is a viewpoint that the custom form, for dialects that care about compatibility, should be the more stable interface point: the generic form leaks internal details that are not strictly part of what an op is, and a properly designed custom form can make an op be precisely what it is supposed to be.

I don’t think any of the public dialects are using custom forms in a way that makes them fit for this purpose, but the ones who need compatibility probably should be.

Yes.

I don’t anticipate the pretty format to be backward compatible in most cases, but it is possible to implement a backward compatible custom parser, look for ParseResult parseVersionedOp in the revision:

static ParseResult parseVersionedOp(OpAsmParser &parser,
                                    OperationState &state) {
  if (auto version =
          parser.getDialectVersion("test").dyn_cast_or_null<IntegerAttr>()) {
    if (version.getInt() < 42)
      return parser.parseKeyword("deprecated_syntax");
  }
  return parser.parseKeyword("current_version");
}

And the test shows that both of these IR are parsed, current:

dialect_versions { test = 42 }
test.versionedB  current_version

And old one:

dialect_versions { test = 41 }
test.versionedB  deprecated_syntax

But what’s more interesting is the post-parsing hook: this will enable use-cases involving the generic printer to have an upgrade path when the dialect definition changes. This is more interesting than the syntax in my opinion.

No: that is a limitation, even if you version your dialect you are limited by the underlying serialization. The generic format hasn’t changed in a while but it isn’t a guarantee that it won’t. Also even in the generic format we’re dependent on the attribute/type parsers.
Also, someone interested in versioning may have to be careful about what other dialects they serialize as they may not be versioned with the same guarantees.

How so? I’m not sure I perceive the fundamental difference between ASCII and binary.
The bitcode format in LLVM has advantages in terms of size and speed sometimes, but otherwise the main advantage is the backward compatibility. But LLVM does not have a generic printer…

One thing with a “bitcode” is that because it isn’t human-readable, it is much easier to incremental upgrade it: you can add new “code” for new version of an entry, etc.
With a “pretty” printer readability is important and you can’t really change the generic printer while keeping it backward compatible all the time.

I’m not sure I understand the question? The way it works right now is that dialects opt in with a hook on OpAsmDialectInterface (I reformulated in the original post in case it wasn’t clear).
So the TestDialect for example implements:

  Attribute getProducerVersion() const final {
    return Builder(getDialect()->getContext()).getI32IntegerAttr(42);
  }

And any IR that has entities from the test dialect will be emitted with this at the beginning: dialect_versions { test = 42 : i32 }.
Note that the test dialect uses an IntegerAttr but a dialect can use any attribute (including a custom one) to model their version scheme.
Dialects that don’t implement this interface method won’t trigger an entry in the dialect_versions dictionary.

Makes sense, I suspect with this proposal you’ll already be able to implement what you want, even though we can have some core APIs to make it easier.
If you want to implement it yourself (after this proposal lands in some form), you can:

  • Have a field in your dialect targetVersionDeployment.
  • Before emission you can: getContext()->getLoadedDialect<MyDialect>()->setTargetVersion(deploymentVersion).
  • In the printer for this dialect, you have access to the targetVersionDeployment.

That said I suspect that most transformation could be done on the IR itself: before emission you may have to “downgrade” your IR internally (renaming op / attributes). That may be a problem in terms of the verifier though: if you want to support multiple IR version in memory and have the verifier be aware of this, we’re getting into deeper concerns than what I looked at here, which is just the serialization boundary.

This is a good point, someone really motivated could design their assembly with this in mind.

At some point, to get these constraints, I suspect you have to get into a “close ecosystem” for your dialects (not reuse things from upstream like Affine, Memref, etc.) and you may be better off with a custom serialization entirely (proto-based, flatbuffer-based, YAML, XML, whatever…).

ah, thanks - I didn’t read closely enough. my bad :slight_smile:

I’m more interested in a final state of being able to serialize and distribute. This could be a part of that, but at the moment this feels like a feature that is on an island to me.

Yes, but doesn’t that backwards compatibility include the format? So the textual format is more readable but not backwards compatible, while the binary one is a format one of whose goals are to provide upgrade behavior and be backwards compatible parsing wise. It isn’t text or binary, but the underlying format allows one to layer updates on top of it (hence the question if this is changing the promises on generic). It feels like having that enables the second part with actual guarantees (well the second part, which you are describing here, could still be limited based on dialects involved as this is opt-in to enable upgrade).

That is a very valid point. I think one that is difficult to handle generically. Basically they are creating specific guarantees in the output generated and indeed there are some injected attributes which are internal implementation details.

Combing the above, it seems that here this is mostly useful if you’ve already built a stable serialization format for your (interop) dialects used and then you can layer it on top. So generic format would be less useful here (it may change less frequently, but need not be parsable).

Again: only because LLVM does not have a generic format for operations, and it is a pain to maintain a custom textual format backward compatible. There is little value to increase the complexity of the parser this way.
MLIR textual format is different in that even when your dialect changes, you can get the in-memory representation by parsing the generic form (even though it’ll be invalid).

I’m not proposing to change the promises on generic, because there can be deeper change in MLIR that could require to change it. This is more of question of maturity in the project really.

But let’s look at it practically: when was the last breaking change to the generic format? 2 years ago?
While not promising that we won’t break it in some way in the future, it has been extremely stable in practice I believe, and that is probably “good enough” to enable many use-cases.

This is underspecified: do you want to serialize and distribute files that you will be able to reload 10 years from now?
This goes beyond the syntax, I don’t see this as a “test vs binary” thing, instead it gets into what kind of breaking changes we could do that would go deeper in the system to affect our generic format.
Realistically, even changing the way we attach attributes to operation would be implemented with an “upgrade” path in the generic format parser.

Due to the nature of MLIR, the most difficult part is to emit IR that does not depend on any other dialect than yours, we can do it in the TensorFlow Graph Dialect for example ; but otherwise most use-cases I’ve seen end up depending on builtin.func op.

I agree with this assessment. In addition, even though I can’t imagine them, major breaking changes to the format would presumably be benefit based and not decrease the representational capacity. I can’t really imagine a situation where, at need, we couldn’t have a parsed format translator. The world is full of these kinds of cases, and I think we can solve for it when the time comes (if ever).

Most of those cases I am aware of would rather not be depending on builtin.func (I get this feedback a lot), but it is a very recent development that this has been generalized enough to be usable in practice (with a reasonable level of effort and not too many sharp edges). I think this falls into the project maturity bucket.

1 Like

I think in general I’m in favor of starting to build something out for this, and have in the past basically considered the same as what you have proposed. I do have various open questions though:

Having any attribute represent the versioning scheme is nice, but I am a bit apprehensive about the guarantees here: more specifically, what happens if a dialect attribute is used as a version modifier but is also versioned itself? Not that this will happen, but we should at least understand what would happen if it does (and include some guidelines in the docs).

What scope of modifications are allowed during upgrade? Can you mutate (including deletion) any reachable IR? i.e. can you go into parent operations?

How do we handle inter-dialect interactions? I think the most obvious case of this is dialect prefixed attributes. If two dialects are updating the same IR, we should have some guarantees on who goes first to prevent weird situations where some IR is upgraded and some isn’t, but you don’t know what situation you are in.

I think this brings up a distinctive difference on what this versioning capability is intended to support (as far as I can tell so far): this does not provide versioning capabilities for any serialization format, but for the in-memory form of the IR itself. In that sense, how you get the IR to that in-memory form is mostly unrelated (whether you come from a textual or binary form doesn’t matter, just that you end up at the same in-memory destination). That being said, I would expect versioning for the textual (or some binary) format to be closely tied to the same versioning scheme as the in-memory form, but that can just be sugar built into whatever is doing the parsing.

+1 here. I’m not ready at this point to start committing anything to the stability of the generic format. It hasn’t really changed much in recent times, but we also haven’t really looked at changing it (for better or worse).

– River

I consider this up to the dialect policy itself, that is: the framework does not have an opinion on it.
(happy to add a sentence for this in the doc)

The interface takes the top-level operation, so there is not really a question of “can you go into parent operations?” I think.
Otherwise I wouldn’t put any restriction from the framework point of view.

One more interesting challenge would be in case of multiple dialects: one dialect upgrade path running before the other can’t make assumption about the state of the other dialect entities. That limits kind of “de-facto” what you can do here.

That said, if you take a project out-of-tree. Let’s say that they take a new MLIR every 3 months, they could bump their dialect version at this point (even if nothing changes in their dialect) and use the hook to upgrade the few upstream MLIR entities that have changed in the last 3 month!
(I think that addresses also your “How do we handle inter-dialect interactions?” question)

There are two hooks: what you’re saying (if I read you correctly) applies to the “post-parsing” one, but the AsmParser exposing the version allows folks to use it in their custom assembly format, effectively providing the ability to version the serialization format if someone really wanted to.

This is a really interesting thread from the TOSA perspective. I’ve only taken a quick pass over all the discussion but we do care about versioning for TOSA. In fact, the original RFC contained code trying to do a very primitive form of it, but we took it out as it wasn’t tenable.

TOSA is a dialect that is spec-backed and the spec is versioned. It has a serializer and deserializer, which constitute important parts of a conformance test infrastructure that consists of both MLIR-based components and pieces that aren’t, e.g. the reference model which consumes TOSA serialized (flatbuffers) form.

So far we’ve been working towards the first major version (v1.0). Up to this point, new minor versions supersede prior designs without pressure on versioning. However this will change once v1.0 is set and we continue to build towards newer versions while needing to support IP and its software built upon v1.0 .

Without considering anything about whether the RFC covers it, some things that a versioning system would encounter from our experience:

  • Parsing changes to ops and op set as well as potentially different set of data types (which may or may not be a superset). E.g. we have older networks with quantized uint8 content that gets rescaled into quantized int8 domain.

  • Accommodating passes that implement backward compatibility of the IR by converting from older to newer IR version. It’s also not impossible the reverse may be desired to the extent possible to backport newer version IR constructs to an older one, on a case by case basis.

In a production setting it isn’t an infrequent situation to be asked to run a new network on an existing installed base of IP supported by an older software stack, if at all possible. In this case it may not be possible to parse the network because the older flow with the doesn’t accommodate the frontend epoch the model is crafted with - backporting the TOSA IR might accomplish it however.

  • Managing versioning of legalizations . While the version number offers a simple predicate to wrap blocks of code within, there are probably additional optimizations and use cases, e.g. not all legalizations may be impacted and it should be possible to carryover such legalizations to/from either old or new versions of the IR.

Now I’ll go back to reading the conversation into the deep dive mechanicals here once more.

I suspect this kind of use-cases go much beyond what I am implementing here: this proposal only allows a slight decoupling of the serialization vs the in-memory dialect representation.
It is very likely that for something more advanced you may want instead to manage versioning of the dialect inside the dialect itself (through interfaces and other mechanism possibly). There was a proposal to integrate versioning in ODS itself: Graduate op versioning mechanism to OpBase.td but it requires more consideration in itself.

1 Like