We are looking to deprecate the Compact Binary format for sample profiles used by FDO builds.
Related discussion: ⚙ D76255 [SampleFDO] Port MD5 name table support to extbinary format.
Implementation here: ⚙ D149400 [llvm-profdata] Deprecate Compact Binary Sample Profile Format
Currently there are 4 profile formats: Text, Binary, Compact Binary, and Extensible Binary. The implementation of the profile reader classes is poorly maintained, with a lot of code repetition and intertwined function calls. Extensible Binary is currently most commonly used because of its forward compatibility, and it is capable of representing any profile in the other three formats (but not necessarily the other way around). When compared to Extensible Binary, Compact Binary has several major disadvantages that cannot be fixed:
- The lower 64 bits of MD5 of function names are stored as variable length integers (ULEB128) ranging from 1 to 10 bytes. In average it takes 10 (9.49) bytes to store a random uint64, which is worse than storing it unencoded (8 bytes), and the reader has to decode every value before being able to read the function offset table.
- Unable to store function metadata.
- Not forward compatible. Unlike Extensible Binary format, there is no easy way to add a new (or customized) section with additional profiling information.
Furthermore, I am planning a series of refactoring to significantly speed up profile loading time for industrial usage. The refactoring affect the implementation of all profile reader classes, so having one fewer format to support will significantly reduce maintenance workload.
Migrating from Compact Binary to Extensible Binary
[to be added]
Please comment if you are aware of any LLVM user still using Compact Binary sample profiles on non-trivial projects.