[RFC] MLIR Bytecode: a stable serialization format

We have a very simple proposal here: we won’t break the bytecode format, preserving the ability to both deploy to and consume from version 0 (and any intermediate versions) moving forward, we’d like to document this on the website.

A year ago, we introduced a bytecode serialization format for MLIR. Since then we improved our serialization with a few features (like IR/dialect versioning), and a couple of months ago we discussed the case for MLIR Bytecode format stability and backward compatibility. Since then multiple community members and various companies (including Apple, Google, Modular, Nvidia) stepped up to help implement the remaining features we wanted before reaching a point where the bytecode format could be considered stable, and this is now complete.

In the process, we bumped the version of the bytecode 5 times, and we managed to preserve the ability to target any of the previous versions of the bytecode. We are now quite confident that we can preserve the stability of the format moving forward and we want to document this.

That said, it is important to realize that the promises of the bytecode format are made assuming immutable dialects: the format allows backward and forward compatibility, but only when nothing in a dialect changes (operations, types, attributes definitions), and this is the level of guarantees we’re introducing here.

No dialect in upstream MLIR is promising any guarantee at the moment, including the builtin dialect: if you are using tensor, memref, affine constructs, forward and/or backward compatibility may still break.

Dialect versioning is orthogonal to the versioning of the bytecode format itself: the feature introduced in IR/dialect versioning allows dialects to individually record their own versioning scheme in the bytecode and manage their evolution.

We will carefully consider versioning schemes for some dialects in-tree as they mature and requirements and needs are clarified. This is in particular important to get feedback from both active contributors and users/products.

In the meantime, if you have your own dialect that you need to provide guarantees, you may look into BytecodeDialectInterface to control your own dialect versioning (independent of the bytecode format) and implement backward compatibility schemes (back-deployment should also be possible, pending an extra API). Alternatively, the StableHLO project is using a different approach where the StableHLO dialect is cloned entirely to a new version to “freeze” it, and then the dialect conversion framework is used to manage forward and backward compatibility of the StableHLO dialect with respect to these “frozen” versions.

6 Likes

If we’re aligned on it, this is a huge milestone, and I know that it has taken a large amount of work over the last year and multiple stakeholders to get to the point where we are comfortable making this statement.

+1 from me as a bystander to the work.

Just to be clear, you are proposing that version 0 is something of a long-term-supported version specifically (and are not making a statement about subsequent versions)?

I’ll rephrase in the original post: every single individual version ([0-5] today) can be targeted and can be loaded from, we think at the moment that we can continue that way.

That’s definitely great and appreciated for users. I was expecting a more conservative, long-term supported version approach but am not going to argue for such a conservative stance.

Yes my initial approach was to list a set of features to get to version 1.0 and call this stable: being able to preserve bi-directional compatibility has been an interesting learning experience while adding all the features the last few months! I didn’t anticipate this result :slight_smile:

This is great! We’ve seen very positive results thus far using the bytecode format, both in serialization speed and size, very happy it has reached a maturity where stability is possible!

For testing - we have been keeping StableHLO at the latest bytecode version, and have had success detecting forward/backward incompatibilities statically using a simple diff test that sets a consistent producer string, strips debug info, and compares the payload to a known-good bytecode file. Would be happy to contribute something like this to MLIR, may need some help/suggestions for crafting the contents of the test file.

Sounds good! It has been clear from the start that bytecode stability and dialect stability are separate.

For a short term goal, there should be some way for a dialect author to create a stable bytecode file. The complications with this are the parts of builtin dialect that aren’t separable from Operation. In our attempt to remove dependency on all of builtin, we still have DictionaryAttr (resolved by properties, I believe) and Location. Admittedly, I haven’t experimented if dialect-specific locations are possible, but if so then a fully stable bytecode artifact should be possible already.

1 Like

This is great! So, as I understand, Version 5 of the byte code reader can read any of the previous versions. However, can Version 5 of the byte code writer also write out any of the previous versions? Is there an API to specify what version of byte code to write out? This will be useful if we have Tool A built with a newer MLIR version producing byte code that needs to be consumed by Tool B built with an older version of MLIR. Assuming of course, the dialects involved are stable.

Yes

Yes, it was added in ⚙ D146555 [mlir][bytecode] Allow client to specify an older version of the bytecode to emit
See the API here: https://github.com/llvm/llvm-project/blob/378f1885e3536ddf93e780f25a84ad493140ff42/mlir/include/mlir/Bytecode/BytecodeWriter.h#L43-L46C26

The back-deployment can fail (for example a dialect serialization may use in the future a construct that didn’t exist in older bytecode format), so writeBytecodeToFile now returns a LogicalResult.

Thanks! I suppose it is up to the client to check the minimum byte code version required for a given dialect version. I assume that writeBytecodeToFile is guaranteed to fail gracefully.

Yes, that’s the goal.