Disclaimer: I have no plan to break 3.x bitcode compatibility in any way.
I’m experimenting with ideas to improve the current bitcode serialization and deserialization in general.
I’m interested in improving the speed and the flexibility of the reader/writer, as well as making the code easier to deal with.
My current griefs against our bitcode format are mainly the lack of built-in random access (which prevents efficient lazy loading), and the implementation which does not help layering the high-level structure of the serialization from what is emitted (and the compression part). On the speed side, the bit level management of record has also non-negligible overhead right now.
The two key issues with the current format that prevents random access are:
- Abbrev are lazily emitted in the stream and can be interleaved with record. You have to scan the whole block to get the context for a record.
- Because records have variable size, you need to parse then in sequence to find the “interesting” one.
I just finished a quick prototype using Google Flatbuffers ( FlatBuffers: FlatBuffers ). This is a solution that provide “zero-copy” serialization/deserialization, by avoiding the use of variable length integer and aligning appropriately every field. The structure of the serialization is described using a DSL, and the serialization/deserialization code is generated from this description.
My prototype is a naive description of the IR, but complete enough to handle all the .ll files present in the LLVM validation, and rountrip a 800MB file from the LTO link of Clang itself.
The results is that the extra flexibility given by the random access is helping beyond the lazy loading: for instance loading metadata is an issue because of forward reference usually, while the random access allows to “recurse” on the operand when loading a metadata. This simplifies the algorithms both on the reader and the writer.
On the drawback, Flatbuffers seems to target small use cases (not a lot of different type of records or large buffers). The implementation is not totally optimized, for example 40% of the writer time is spend on “vtables” (equivalent of our abbrev) deduplication on some of my test cases.
Also multiple possible obvious size optimizations are not possible with the current Flatbuffers, currently my prototype emit files that are ~2 times larger that our bitcode format. By working on the schema I probably can get this to down 50%, but that may still be too high for us (especially considering that some of these limits are inherent to Flatbuffer implementation itself).
So my current thoughts on the topic is that we could still learn from other recently developed format and bring part of the ideas into our format where it makes sense. For example we don’t need random access in the instruction array, but it’d be appreciable for the metadata though. We'd want zero-copy for a symbol table (see https://llvm.org/bugs/show_bug.cgi?id=27551 ) but not for the instruction stream.
I’m also interested in what are other people’s stakes on the bitcode format (other than backward compatibility), what tradeoff are you ready to make on the size vs speed/flexibility for example?