[RFC] Supporting ICF-Merged Functions in GSYM Debug Format

TLDR: GSYM doesn’t support overlapping (ICF’ed) functions, so add FunctionInfo.MergedFunctionsInfo to the GSYM format to represent all overlapping functions at a certain address.

Background:
The current GSYM debug format does not support representing functions that have been merged by the linker through Identical Code Folding (ICF). This limitation can lead to incomplete or inaccurate debug information for optimized binaries where ICF has been applied. Currently the information about overlapping functions is not included in dSYM files so the gSYM format is not designed with this use case in mind and if presented with overlapping functions in the dSYM will basically just take the first one and ignore the rest.

Proposal:
We propose extending the GSYM format to support ICF-merged functions while maintaining the non-overlapping primary function table. The key ideas are:

  • Introduce a new MergedFunctionsInfo structure to hold information about functions merged by ICF - this will basically be a vector of FunctionInfo’s sharing the same address range.
  • Modify the FunctionInfo structure to include an optional MergedFunctionsInfo field. Here, we will have a “master” FunctionInfo (just like normally) - and additional functions sharing the same address range will be located in FunctionInfo.MergedFunctionsInfo - which will basically be an array of FunctionInfo’s. There is nothing special about the master FunctionInfo - it is there for backward compatibility with the current representation - where one (arbitrarily selected) FunctionInfo is kept for a given address.
  • Update the GSYM reader to handle and display information about ICF-merged functions.

Note that initially the sub-functions in MergedFunctionsInfo will not be actively used in symbolication. However, this format change lays the groundwork for future work to leverage the information for more accurate symbolication. The plan is to use call stack (and other) information to disambiguate which MergedFunctionsInfo entry to use during symbolication. For this to work additional information will be needed in the gSYM also - but that is outside of the scope of this RFC.

Current Limitations:
This proposal currently works only for functions merged across different compile units. Merging functions within the same compile unit will result in inaccurate debug info. This limitation stems from an issue in the source dSYM generation.
A separate proposal is being discussed to address this limitation at the DWARF level: [RFC] New DWARF attribute for symbolication of merged functions
Once that DWARF-level fix is implemented and adopted, this GSYM proposal will be able to handle all ICF-merged functions correctly.

Compatibility
This change would not require a version change for the GSYM format. Older readers would still be able to function normally - they would just see the one master FunctionInfo and ignore any presence MergedFunctionsInfo in it. This is basically the same behavior we see currently in case of ICF’ed functions.

Please share your thoughts, concerns, and ideas for improvement.

Looks good. My main request is:
1 - We still emit one valid function info with a line table and inline info
2 - We also emit the new MergedFunctionsInfo

This keeps existing clients able to parse and get symbolication, and allows clients that know about the new MergedFunctionsInfo to take advantage

1 Like

PR Here: [gSYM] Add support merged functions in gSYM format by alx32 · Pull Request #101604 · llvm/llvm-project · GitHub

Follow up RFC: