Hi all, I’d like to propose the addition of a binary format for MLIR (i.e. an MLIR equivalent of LLVM bitcode). This is something that we have discussed many times in the past, but there was never a strong enough justification/need to have one (plus the general volatility of MLIR). That being said, I think the time is right for us to start building this out. We are starting to hit areas where we really want the benefits that a binary format brings to the table; namely serialization speed and size, mmap capabilities, more easily enabled versioning, etc. From my perspective (both from my work at Modular, and maintainer of other various things), I have at least two recent-ish things that have sparked sending this RFC now (could dig up more on request):
At Modular we have large swathes of data (e.g. ML weights) that want/need to live alongside IR (e.g. see previous discussion on adding “resources” to mlir), which is extremely difficult/expensive to do with a textual format: firstly the cost of transcoding from hex/base64 is significant (this data can be several mbs/gbs/etc), and secondly given that we can’t mmap from the text file we have to allocate space for this data (which as before, is huge).
For PDLL, we generate PDL (an MLIR dialect) that gets parsed/compiled/etc. at runtime. We currently embed this in .cpp source files using the textual mlir format, but this ends up being slower/creates larger source files/etc.
Given the very generic structure of MLIR, the binary format (from a high level) is actually fairly simple. The format really boils down to a few sections (don’t @ me too much here, there will of course likely be some additional things during the actual implementation, but these are the bigger ones):
Dialect name section
- Containing referenced dialect/operation names
- A table containing each of the referenced attributes/types within the input IR. By default attributes/types can be encoded using their string form (which enables all user attributes/types to be supported out-of-the-box), but dialects can define explicit/optimal binary encodings if desired.
- This section (which doesn’t necessarily need to be one section in the actual implementation) contains the actual IR, operations/blocks/regions/etc. The IR encodings are extremely simple given that all operations have the same underlying generic structure, think of the “generic” operation form in the textual format as an indicator of how simple this can be.
- This section holds any dialect/external resources.
That’s… kind of it (from a high level of course). There may be some other things like a string section to collate common strings, etc., but the general structure is just as above. Compared to other IR representations, the generality of MLIR makes the format on its own fairly simple to conceptualize.
This aspect for me is the most interesting and I think this likely merits the most discussion and scrutiny. From my perspective, our main options for encoding effectively boil down to either: bitcode (using LLVM’s bitstream), or a custom bytecode. I’ve got working prototypes of both that I’ve used to develop some preliminary opinions, but they aren’t to a quality to share at this point (and I also want to make sure we agree on a direction before placing expounding a bunch more effort cleaning up a specific path).
Bitcode is likely the thing most people would expect from an MLIR binary format, given its use for LLVM’s bitcode. The benefit of using a Bitcode/Bitstream encoding is that many encoding complexities are already taken care of (e.g endian conversions, VBR values, records/blocks, etc.), the underlying infrastructure is well tuned and tested, and there is precedent within LLVM for using it. Given the precedent within LLVM, Bitstream seems like a fairly solid choice for an encoding.
While it’s always good practice to reuse (or at least try to reuse) known-good formats when possible, given all of the complexities/subtleties that inevitably arise, we should also consider what would be best for our particular use case. In my experience so far, Bitcode/Bitstream isn’t as good of a fit for MLIR as it is for LLVM, which creates some interesting complexities when trying to map MLIR to it. For example, the concept of a “Block” in bitstream seems appealing for modeling things like regions, but “Blocks” are heavy enough that IR with lots of regions quickly bloats in both write/read speed and size. Another semi-problematic area (at least in my testing for MLIR) was the use of “bits” as the boundary for the encoding. Every value emit/read requires mangling around with bit boundaries. From prototyping VBR bytecode encodings, I wasn’t actually able to get the bitstream encoding to be smaller or faster than the bytecode encoding (even with trying various tricks/tracking sizes of indices/using VBR/etc). Finally, and anecdotally (i.e. not something I would use strongly to lean either way), it takes a bit of time to understand bitstream “abbreviations”. It may be just a matter of me not having looked at them in forever, but if we use bitstream we will be requiring users to understand abbreviations as well (if they want more optimal attribute/type encodings that is). This could create friction with users, but we could combat this with effective examples and documentation.
There are pros and cons to both, but my slight leaning would be to use a bytecode encoding instead of reusing bitstream. The code for the prototype there ended up being faster, with a smaller resultant encoding (though there is likely a way bitcode could be better with more magic?), and easier to understand/extend both from the internal implementation and dialect interface point-of-view. This is my anecdotal experience leading up to this RFC, so it would be nice for others to provide their perspective (I’m sure there are plenty with preference to either side).
Well, no. This RFC is explicitly not about establishing/enforcing stability guarantees within MLIR. A binary format definitely makes some aspects easier, and whatever we land on will be
setup to work with some stability guarantees, but this is an explicit non-goal from the start. Stability in MLIR requires a much broader discussion, mostly because the more important thing there is not really the format encoding, but the agreement from dialect owners that they want to support stability and all that entails (it doesn’t matter how stable the encoding itself is if it goes out of date with no upgrade path). Stabilizing MLIR would also be a good time to think of “breaking” changes to representational constructs that we want to push through before committing to anything. Either way, I don’t foresee it being difficult to build stability guarantees (e.g. upgrade on import) on top of the format, it’s an inevitable goal just not an initial one. Not stabilizing initially also gives the binary format room to stabilize itself, as we evolve it and figure out more optimal encodings/tune it to be useful in the general case.
Would love everyone’s thoughts here. This is a significant addition to the infra and will have long term wide reaching impacts.