Modeling ELF linker with lld/Chunks.

Hi All,

I have a design question of how your linker would be suitable for modeling ELF semantics.

The ELF linker needs the functionality of reading relocations ahead of symbol resolution for the following usecases :-

  • Add linker defined symbols if there is a relocation to the symbol (Examples are : defsym, PROVIDE)

  • Dont halt the linker operation if there are undefined symbols but they are not called from the root set (Do garbage collection and then report whether symbols are really undefined)

  • A reference to a symbol inside a group, from outside a group need to be through an undefined symbol

  • For string merging, relocations are needed in advance before they can be merged.

  • For identical code folding, relocations are needed in advance before they can be merged.

There are also more usecase where there is not a symbol but a section, examples of them are :-

  • sections that contain mergeable strings (.rodata)

  • sections that contain Eh Frame information, where FDE’s are discarded for functions that are garbage collected.

So I was trying to figure out how the Chunks and relocations would be related in the Reader, which means that it would be very similiar to what we have with the Atom model.

Thoughts / opinions ?

Shankar Easwaran

Hi All,

I have a design question of how your linker would be suitable for modeling
ELF semantics.

The ELF linker needs the functionality of reading relocations ahead of
symbol resolution for the following usecases :-

- Add linker defined symbols if there is a relocation to the symbol
(Examples are : defsym, PROVIDE)

Symbol table contains both undefined and defined symbols. We know what
symbols are needed to be resolved to link that file correctly without
reading relocation table.

- Dont halt the linker operation if there are undefined symbols but they

are not called from the root set (Do garbage collection and then report
whether symbols are really undefined)

Dead-stripping is done after eliminating duplicate COMDAT symbols.
Unreferenced symbols are naturally ignored.

- A reference to a symbol inside a group, from outside a group need to be
through an undefined symbol

I don't get the meaning of the question.

- For string merging, relocations are needed in advance before they can be
merged.
- For identical code folding, relocations are needed in advance before
they can be merged.

These happen after symbol resolution and you need to read relocation table.

Thanks for the reply.

If foo is in a group, and bar is calling foo which is outside the group, it cannot refer to foo directly but use a undefined symbol to refer to it.

Global symbols that did not make it into the final symbol table (because of coalescing or group COMDAT), are easy to discard.

But, darwin supports “dead code stripping”. The way it works is you start with atoms (symbols) that must be preserved (for an executable program, that would be “main”), mark them live then start recursively marking live the atoms they reference. In order to do this, you must parse the relocations to and figure out: 1) which function/data each relocation applies to, and 2) what function/data each relocation references.

A couple interesting points:
a) The master symbol table does not help with dead code stripping because often functions/data are static or anonymous and thus are not in the master symbol table.
b) It is ok for dead code to reference undefined symbols, since the dead code will be stripped away. The resolver phase normally ends when there are no undefined symbols remaining or with an error about undefined symbols. But with dead stripping, it is not an error to end with undefined symbols.
c) Once the dead code is identified, any symbols the dead code added to the master symbol table need to be removed.

-Nick