Summary
I discuss the limitations in the current ELF object file format for representing metadata information that applies to a given symbol. Encoding symbol metadata into a .symtab_meta
section is presented as an alternate method/mechanism that is more powerful and flexible for the task at hand. The .symtab_meta
section is currently used in the implementation of the location, noinit and persistent attributes as well as several other attributes in various target compiler toolchains supported by the Texas Instruments (TI) compiler tools group. The existing TI implementation is ELF-specific but could easily be adapted to work within other object formats as well.
I then summarize a proposal to upstream support for the .symtab_meta
extension to the LLVM source base.
Motivation
In an ELF object file, the information associated with a given symbol is represented in a standard ELF symbol table entry. An Elf32_Sym record currently stores information about a symbol’s
- name (
st_name
) - an index into the string table - value (
st_value
) - its actual integer value (if symbol is absolute) or the address where the symbol is defined (if symbol is weak or global) - size (
st_size
) - size in bytes based on the type of the symbol - type and binding (
st_info
) - binding in most significant 4-bits and type in least significant 4-bits, where type refers to a data object, function, section, or source file. - 8 extra bits (
st_other
) - ELF specification indicates that this field is normally filled with 0 and has no defined meaning - section (
st_shndx
) - index into the section header table pointing to the section where the symbol is defined
Compiler-generated DWARF debug information provides additional information about a given symbol. For example, the DWARF information entry for the symbol will be annotated with both a DW_AT_location
and a DW_AT_type
attribute that describes the runtime location of a symbol’s definition and a symbol’s data type, respectively.
There is additional information that can be associated with a symbol to assist the linker with the proper handling of a symbol. For example, the clang compiler supports the retain attribute that instructs the linker to keep the definition of a symbol in a link even if it isn’t referenced elsewhere in the application. In the compiler-generated object file, this attribute is propagated to the section in which the symbol is defined. It is encoded as the SHF_GNU_RETAIN
section flag in the sh_flags
field of an ELF section header record (Elf32_Shdr or Elf64_Shdr).
The sh_flags
field of an ELF section header is to be interpreted as a bitmask, which means that any general and/or processor-specific semantic information relevant to a given section is limited to the number of bits in the sh_flags
field.
An additional limitation to representing semantic information about sections or symbols in a bitmask is that a symbol attribute that has an associated value cannot be represented with only a single bit in the sh_flags
field.
An Alternative Representation of Symbol Metadata Information
There is a more expressive and flexible way of representing symbol metadata information in an ELF object file.
Consider an embedded application that defines a data object that must reside at a specific location at run time. The TI compiler toolchains support a location attribute that can be specified as follows:
__attribute__((location(
0x12345678
)))
int
my_located_var =
10
;
The TI compiler will generate a symbol metadata record into a special section named .symtab_meta
that contains:
- a symbol table index pointing to the symbol that this meta-data applies to
- the kind of symbol meta-data; an integer representation instead of a bitmask
- the value associated with the meta-data; for a location kind of meta-data, the value would be an address
There are ways to get around some of the above-mentioned limitations. For example, the Arm Ltd. compiler supports an at attribute that is identical to the above location attribute in intent:
__attribute__((at(
0x12345678
)))
int
my_located_var =
10
;
Instead of encoding the value of the at attribute in a flag or an extension to the symbol table, the Arm Ltd compiler puts the definition of the variable into a section whose name is annotated with the specified address argument.
However, there are additional benefits to the proposed .symtab_meta
approach to representing extra semantic information about symbols. Not only does representing the metadata kind as an integer vastly increase the number of different kinds of symbol metadata information that can be represented in an object file, but also:
- It is not limited in the number of different kinds of symbol metadata information that can be applied to the same symbol
- It is not limited in the kinds of values that can be associated with a given piece of symbol metadata information; a value field could be:
- an integer value (as in the above example)
- a string table index (e.g. a string encoding of format specifiers associated with printf-like function calls)
- a symbol table index (e.g. indicating a symbol-to-symbol alias mapping)
Encoding symbol metadata as an extension to the symbol table enables many capabilities that are particularly useful for embedded applications, such as:
- Communication of placement-related information from compiler to linker
- profile-based placement or explicit user-directed placement - expressing the preferred memory type in which to place a symbol (e.g. TCM, on-chip, off-chip, etc)
- specific placement - Arm Ltd’s at attribute / TI’s location attribute
- Communication of special initialization semantics
- TI’s noinit and persistent attributes
- Link-time function specialization
- Boot routine
- Run-time initialization
- memset, memcpy specialization
- printf specialization
The majority of the above capabilities have already been implemented and are used today in the TI compiler toolchains.
Proposal
I propose to upstream support for this symbol metadata extension to the symbol table to the LLVM source base. This entails:
- Providing a mechanism to opt-in/opt-out of including this support in a given toolchain
- Encoding symbol attributes and semantic information, that is not otherwise already represented, into compiler generated object files, specifically in a
.symtab_meta
section consisting of an array of fixed-length symbol metadata information records - Adding support for symbol metadata assembly directives that encode symbol metadata information into an object file that is generated from the assembler
- Adding support for generating symbol metadata assembly directives when compiling to assembly
- Adding support to edit the
.symtab_meta
section in conjunction with edits to the symbol table in llvm-objcopy - Reading, processing, applying the contents of a
.symtab_meta
section in the lld linker
A specification of the proposed .symtab_meta
section and fixed-length symbol metadata records follows.
.symtab_meta Section and Symbol Metadata Records
.symtab_meta Section
The section table entry for the .symtab_meta
section will contain values that are particularly relevant to the .symtab_meta
section:
sh_name
= index into string table pointing to “.symtab_meta” stringsh_type
= SHT_SYMTAB_META (a new section header type)sh_addr
= 0 (.symtab_meta section is not loaded into target memory)sh_link
= index of .symtab section in the section header table
Symbol Metadata Records
Elf32_SymMeta
typedef
struct
{
Elf32_Word sm_info;
Elf32_Word sm_value;
} Elf32_SymMeta;
where:
- Index of symbol associated with metadata:
index = ((Elf32_Word)sm_info >> 8);
- Symbol metadata kind:
kind = (sm_info & 0xff);
- A sub-range of
kind
identifiers will be reserved for processor-/toolchain-specific use - Interpretation of
sm_value
field depends onkind
Elf64_SymMeta
typedef
struct
{
Elf64_Xword sm_info;
Elf64_Xword sm_value;
} Elf64_SymMeta;
where:
- Index of symbol associated with metadata:
index = ((Elf64_Xword)sm_info >> 16);
- Symbol metadata kind:
kind = (sm_info & 0xffff);
- A sub-range of
kind
identifiers will be reserved for processor-/toolchain-specific use - Interpretation of
sm_value
field depends onkind
History and Rationale
I submitted an RFC for an earlier and less sophisticated method of encoding symbol metadate directly into a symbol table entry in April of 2019 (RFC - a proposal to support additional symbol metadata in ELF object files in the ARM compiler - Project Infrastructure / LLVM Dev List Archives - LLVM Discussion Forums). Feedback from that RFC thread was incorporated into what has evolved into this RFC.
An earlier version of this .symtab_meta
specification was proposed for inclusion in the upstream GCC source base, but did not get adequate support for approval at the time. The specification has since been refined and the usage of .symtab_meta
in TI compiler tools has expanded into areas such as support for user-directed placement attributes in source code.
Extending the symbol table with the .symtab_meta
section is a piece of infrastructure that has significant value, especially for embedded toolchains and applications.