TLDR: GSYM doesn’t support overlapping (ICF’ed) functions, so add FunctionInfo.MergedFunctionsInfo
to the GSYM format to represent all overlapping functions at a certain address.
Background:
The current GSYM debug format does not support representing functions that have been merged by the linker through Identical Code Folding (ICF). This limitation can lead to incomplete or inaccurate debug information for optimized binaries where ICF has been applied. Currently the information about overlapping functions is not included in dSYM files so the gSYM format is not designed with this use case in mind and if presented with overlapping functions in the dSYM will basically just take the first one and ignore the rest.
Proposal:
We propose extending the GSYM format to support ICF-merged functions while maintaining the non-overlapping primary function table. The key ideas are:
- Introduce a new MergedFunctionsInfo structure to hold information about functions merged by ICF - this will basically be a vector of
FunctionInfo
’s sharing the same address range. - Modify the FunctionInfo structure to include an optional MergedFunctionsInfo field. Here, we will have a “master”
FunctionInfo
(just like normally) - and additional functions sharing the same address range will be located inFunctionInfo.MergedFunctionsInfo
- which will basically be an array ofFunctionInfo
’s. There is nothing special about the masterFunctionInfo
- it is there for backward compatibility with the current representation - where one (arbitrarily selected)FunctionInfo
is kept for a given address. - Update the GSYM reader to handle and display information about ICF-merged functions.
Note that initially the sub-functions in MergedFunctionsInfo
will not be actively used in symbolication. However, this format change lays the groundwork for future work to leverage the information for more accurate symbolication. The plan is to use call stack (and other) information to disambiguate which MergedFunctionsInfo
entry to use during symbolication. For this to work additional information will be needed in the gSYM also - but that is outside of the scope of this RFC.
Current Limitations:
This proposal currently works only for functions merged across different compile units. Merging functions within the same compile unit will result in inaccurate debug info. This limitation stems from an issue in the source dSYM generation.
A separate proposal is being discussed to address this limitation at the DWARF level: [RFC] New DWARF attribute for symbolication of merged functions
Once that DWARF-level fix is implemented and adopted, this GSYM proposal will be able to handle all ICF-merged functions correctly.
Compatibility
This change would not require a version change for the GSYM format. Older readers would still be able to function normally - they would just see the one master FunctionInfo
and ignore any presence MergedFunctionsInfo
in it. This is basically the same behavior we see currently in case of ICF’ed functions.
Please share your thoughts, concerns, and ideas for improvement.