This question is about the design intention and actual use case for a sample profile in extensible binary format used by FDO. Since I am looking to propose functional addition to this profile format, I would like to get clarification so that my design will not conflict with existing use cases.
In this format the name table section stores function names (or their MD5, depend on the section flag) being referred by the sample profiles. In the current implementation of SampleProfileReader it seems not to consider the case where more than 1 name table sections are present. Note that such input cannot be generated by llvm-profdata or any other known open source tool. The reader does not prohibit such case either, but it is definitely bugged when handling such case.
For example, in llvm/lib/ProfileData/SampleProfReader.cpp:1083 (SampleProfileReaderExtBinaryBase::readMD5NameTable), fixed length MD5 name table are not read until the name is referred by a profile for the first time (lazy loading), but if there is another fixed length MD5 name table read before the sample profiles, the base address of the first MD5 name table MD5NameMemStart
will be overwritten by the new table, and a read into it by a profile’s context index can cause an out-of-bounds crash. (I can construct a case if needed)
However on line 1072 (SampleProfileReaderBinary::readNameTable), NameTable.reserve(*Size + NameTable.size())
suggests the intention to allow reading multiple name tables.
I don’t know what’s the origin design intention, should such case (a) be rejected as invalid profile, (b) be allowed, and all name tables should be concatenated in the order they appear, or (c) be allowed, and only the most recent read name table is used when the current section refers to the name table?
If anyone know of real world projects with such use case please mention them.
There’s also another issue that one name table can use MD5 while the other one doesn’t. This leads to the profiles having a conflicted state. For this I am inclined to reject such profile, until the representation of all function names in sample profiles is changed to MD5 (which I am working on)