A Section-Metadata-Based Approach for Mapping Binary Addresses to Machine Basic Blocks

Greetings,

TLDR: We propose emitting a new section in the binary which would contain information required to associate executable PC addresses to basic blocks. The new proposal decouples the BB information from the symbol table, allowing for more flexibility (stripping the section, adding new BB information fields). Furthermore, the new approach cuts the binary size overhead by ~60% compared to the current implementation (using special BB symbols to pass the information).

Background

Today Propeller (proposed in https://lists.llvm.org/pipermail/llvm-dev/2019-September/135393.html) uses the -fbasicblock-sections=labels option to create a symbol for every basic block to be able to map instruction addresses to individual basic blocks. To lower the .strtab bloat, these symbols are encoded using a unary naming scheme which allows for string compression. For instance, a function “foo” with four basic blocks will have three basic block labels (the entry block doesn’t need a BB label since it can be found using the function symbol).

foo:

je ra.BB.foo

a.BB.foo:

je rra.BB.foo

ra.BB.foo:

ret

rra.BB.foo:

ret

While this serves the functionality of mapping addresses to basic blocks, it poses several challenges:

  1. Unary symbol names are great for compression, but require changes to the demangler for readability.

  2. Stripping BB symbols from the binary requires special linker support as they are placed in the symbol table along with other symbols.

  3. Profilers and debuggers need to be changed to accommodate these symbols. For instance, the profiles from all BBs of a function must be aggregated and the debugger must be able to map to the function symbol, rather than the BB symbol, to show a function-level backtrace.

  4. Relying on the ELF symbol format provides little flexibility for extension of BB information for passing other information about basic blocks (e.g., whether this is a return block, or an exception handling block). Today Propeller encodes this information using special characters in the symbol name (‘r’ for return blocks as in the above example).

The BB Info Section Design

To solve these problems, we propose emitting the BB information in a separate “.bb_info” section. Each function will have its own BB info table emitted into this section. The structure of this table is as follows. The table has a header which includes the address of the function and the number of basic blocks in that function.