[RFC] IR Versioning

Hi everyone! Recently, I’ve been working on binary serialization and compatibility guarantees for StableHLO and CHLO.

In StableHLO Compatibility Spec Proposal by GleasonK · Pull Request #115 · openxla/stablehlo · GitHub, I put together a proposal for providing backward and forward compatibility for StableHLO/CHLO, with an analysis of several common scenarios of dialect evolution and compatibility protocols that could be used to address them responsibly.

To implement this proposal, StableHLO would greatly benefit from upgrade/downgrade hooks along the lines of Mehdi’s original versioning proposal or something similar.

I understand that there may have been an already ongoing discussion about this, and I wanted to ask about its status. Are there any current thoughts about how a design for these hooks might look on the bytecode infrastructure?

Perhaps it would be appropriate to make a proposal for these hooks? E.g. one could potentially imagine extending BytecodeDialectInterface with methods like readOperation and writeOperation, or more similarly to this proposal: upgradeFromVersion / downgradeToVersion. I could work out the details of how this could look like and share them here if so.

Hi @mehdi_amini,

Any schedule we can know around IR versioning landing? We’d actually like to adopt it and ship with it.

Hi all,

I have been working with @saksenadhruv on the IR versioning and we have a working patch that extends the work of @mehdi_amini to the new bytecode format.

The approach that we are proposing is to add an additional optional section to the bytecode that contain references to the dialect version attributes. The byte code reader/writer are extended following this idea, while the dialect upgrade is done on the fly at completion of the IR parsing before the optional module verification.

I am attaching a patch with the proposed changes in the hope of receiving feedback on the approach.

Implements_dialect_versioning_capability_to_mlir.patch (23.8 KB)

1 Like

Could you put the patch on Phabricator? (maybe as a draft)

We will be putting it up in a few days, getting through some of the processes around it in Apple.

Thanks! I’ll wait for that, viewing phabricator is much easier/accessible than a patch for me.

– River

Nice patch, this is a good adaptation of the approach I proposed to the byte code format, thanks!

Something that had me pause the previous approach is that it’s not clear to me that it’ll be enough for what we’d want in the byte code format. In particular while parsing dialect types/attributes we may want to already have parsed the version so that the byte code parsing can adjust immediately and not as an after pass.

The reason is that the current approach requires the ability to continue to build in-memory the Attribute/Type as it existed in the previous version, while we were thinking about providing a mechanism so that Attribute and Type can evolve independently of the serialization format.

Late to the thread, but I have a generic comment: IIUC, if the version is an integer, we’d need to track which version adds/changes/deletes stuff to know what is backward compatible or not.

Instead of using semantic versioning, where you can just ignore the new version if it’s just a patch update, but you may have to upgrade the IR if it’s a minor update or emit an error if it’s a major one.

Taking a linear approach to this, especially with so many dialects and inter-dialect interactions, we’d have to have an multi-dim table with every version against every other version of every dialect to know what can and cannot be used together, or upgraded to.

For instance, dialects Av1 and Bv2 on a file are read by a tool that implements Av3 and Bv5, only Av1 is incompatible with Av3 (so can’t be upgraded) but it can be upgraded to Av2, but that’s incompatible with Bv5, etc.

Now, of course, semantic versioning doesn’t fix everything. We’d have to carry all upstream dialects into some kind of lock-step to make the comparison trivial, which is ok for upstream dialects, and downstream just do what they need to the dialects they care.

Or am I missing the point?

The diff is uploaded to Phabricator:
https://reviews.llvm.org/D143647

I would be happy to address questions/concern and revise as necessary. In particular, it seems to me that if the idea of having a bytecode version that can grow independently from Attributes/Types is what we would like to pursue, we may need to agree on a custom versioning id (for example, one or more integers).

Why is that? I may miss something, but versioning is something entirely private to a dialect, so they can write any blob in the bytecode, whether it is an integer or a complete protobuf message shouldn’t matter?

I had in mind something related to the previous comment (inter-dialect dependencies). In that scenario you may need additional info that cannot probably be encoded independently in each dialect - something like an encoding that tells you what kind of information you find, similarly to the encoding of the operations. Using an attribute has the advantage that you will be always able to parse it correctly, since it’s a known entity, and may be able to handle changes without necessarily breaking compatibility (and requiring a new version of the bytecode format). But maybe this is beyond scope.

Thanks for your feedback – I tried to address the comments and posted an updated diff. Please take a look!