[RFC] Debug sections for hot-patching LLD's ELF output

Hi All,

Sony maintains a downstream patchset to optionally emit additional
informational sections to the ELF output file created by LLD. These
sections describe LLD’s output and the transformations applied during
Linking. These additional sections are used with the static symbol
table (.symtab) to facilitate the operation of hot-patching tools.

Our preferences are that:

  • The information required for hot-patching is stored in the ELF
    output file as ELF sections, as opposed to being emitted into
    auxiliary files. Otherwise, customers have to adjust their processes
    to keep the ELF output file and auxiliary files together when
    packing/moving the ELF output file and ensure they are correctly
    matched.

  • These metadata sections are created by LLD, rather than derived via
    a post-link procedure. Performance is important, as customers want
    to be able to enable the emission of hot-patching metadata by
    default, and having LLD directly emit the required sections is more
    efficient and a simpler work-flow.

The contents of these sections could be seen as debugging information
for the linking process. Certainly, we would want to handle these
sections with the same rules that apply to debugging sections when
manipulating a linked ELF with binary utility tools. For that reason
the sections are all named .debug_lld_* e.g. .debug_lld_linkmap.

Currently, Sony would like to emit the following sections and we
believe that they are generally useful:

  • A linkmap section that contains a subset of the information contained
    in a linker -Map file. This section specifies the linked address for
    each input section.

  • A section which specifies the list of wrapped symbols.

  • A section that describes the GOT. This provides:
    – A category for each entry, examples: GOT entry, PLTGOT entry, TLS GD
    entry, LD TLS tls_index structure entry etc…
    – A slot index at which the entry starts.
    – A size for the entry, as GOT entries may take more than one GOT
    slot (e.g. a TLS GD entry takes two slots).
    – An optional static symbol index to which the GOT entry is associated
    (some entries e.g. the LD TLS tls_index structure are not associated
    with a particular symbol).

  • A section describing the PLT. This section needs to be somewhat
    flexible to deal with the many different PLT’s that exist on ELF
    toolchains. However, for a fixed size entry PLT description the section
    will supply:
    – Which range of bytes comprises the PLT header.
    – The size of a PLT entry.
    – For each PLT entry, the GOT slot index of the associated GOT entry.
    Combined with the information on GOT entries from the GOT description
    section this allows for the association of a PLT entry with a symbol.

Similar to DWARF sections these are non-alloc sections. They are encoded
as sequences of ULEB128 values. As these are debugging sections, not core
ELF sections, a compact representation is justifiable, even if the encoding
is more complex.

In order to anchor this discussion I have created https://reviews.llvm.org/D109804
which contains a prototype implementation of the linkmap section referenced
above.

I would like to ascertain whether the LLVM community would be
supportive of adding the ability to generate such sections to LLD?

Thanks.

Thanks for sending this out.

My initial reaction is that this would be most useful for post linking tools. For human readable output only I expect that we’d be comfortable with existing map file output and a disassembly.

I have a small concern of upstream maintainability without the binary patching tools themselves. For example it may be that all we have is the llvm-readobj/llvm-objdump to textually dump the output. It is possible that we could make modifications with corresponding changes to the text dumps that could break assumptions the binary patching tools are making. I think this is likely to be rare, but I couldn’t rule it out.

While I wouldn’t object as I think the extra debug output is not likely to need a lot of maintenance I think it would be good to get someone actively interested in binary patching or some other post-link tool to comment.

Peter

Thanks Peter,

Thanks for sending this out.

My initial reaction is that this would be most useful for post linking tools. For human readable output only I expect that we’d be comfortable with existing map file output and a disassembly.

Emitting a linkmap section is much more efficient than generating a -Map file. Testing with a chromium package on my windows development box, I got the following results using a build of main with https://reviews.llvm.org/D109804 applied: Base link = 3.524 s, With -Map = 8.441 s, With --debug-sections (for the linkmap section) = 3.910 s. Also, with the linkmap section the information is built into the ELF so you don’t have the problem of tracking which -Map file goes with which ELF. Given those advantages, for some use-cases the linkmap is a superior solution - even when only humans will want to view the information.

I have a small concern of upstream maintainability without the binary patching tools themselves. For example it may be that all we have is the llvm-readobj/llvm-objdump to textually dump the output. It is possible that we could make modifications with corresponding changes to the text dumps that could break assumptions the binary patching tools are making. I think this is likely to be rare, but I couldn’t rule it out.

While I wouldn’t object as I think the extra debug output is not likely to need a lot of maintenance I think it would be good to get someone actively interested in binary patching or some other post-link tool to comment.

Unfortunately, hotpatching seems to be more of a propriety technology rather than opensource (although it would be great to be wrong and I’m hoping that someone may reply and correct me).

Sony and partners are interested in these sections for our downstream toolchain and we will commit to monitoring them and ensuring the format doesn’t change without agreement. We could also add upstream tests that check a hex dump of the section contents for these debugging sections and which have strongly worded comments in the test noting that the format should not change without agreement from stakeholders (we actually successfully use a similar strategy in our downstream tests to flag up upstream changes for review).

I envision that these linker debug sections will have a version field (as DWARF sections do) which we can use to manage changes to the format of these sections.

As mentioned Sony would like LLD to optionally emit sections that describe the GOT and PLT.

The proposed binary format of these sections is as follows:

.debug_lld_got

(minor quibble: I’d probably avoid using the “.debug_*” namespace for things that seem pretty separate from/not a clear extension to DWARF - but maybe there’s precedent for this? Not sure)

Related to naming, is there a chance that other linkers might adopt this feature as well? If so, maybe we should avoid including “lld” in the name and use a more generic name like .debug_linker_got and .debug_linker_plt?

Related to naming, is there a chance that other linkers might adopt this feature as well? If so, maybe we should avoid including “lld” in the name and use a more generic name like .debug_linker_got and .debug_linker_plt?

Yeah, mixed feelings - using lld/llvm/something ensures we don’t collide with someone else’s ideas, but may reduce the possibility of uptake elsewhere. I’d usually err on a non-colliding name at first, and generalize if there’s interest, but it’s possible the non-colliding name just encourages other people to go make there own thing before anyone has a chance to generalize it.

As mentioned Sony would like LLD to optionally emit sections that describe
the GOT and PLT.

The proposed binary format of these sections is as follows:

.debug_lld_got

The .debug_lld_got section contains a GOT description. The GOT description
begins with a header composed of the following fields:

length (uleb)
- The length in bytes of the GOT description not including the length field
itself.
- This allows for padding to be added to the section, useful for purposes
such as slop for incremental linking.

I am dubivious whether people will find incremental linking useful:)
Mold: A Modern Linker | Hacker News from Rui Ueyama
and
[GOLD] Got_offset_list: addend field
from Cary Coutant:
"Do you think you'd ever want incremental linking on powerpc? Frankly,
the effort for just the one target platform was pretty high, the
maintenance on it is burdensome, and I'm tempted to deprecate it and
rip it out at some point in the future."

- The value cannot exceed Elf_Off.

version (uleb)
- The version of the description information.
- Currently, 0.
- The value cannot exceed Elf_Word.

The header is then followed by list of entry descriptions.
Each entry description describes the GOT entry with the same index.
Each entry description starts with three ulebs:

- The first uleb gives the number of ulebs used by this description (so
that the description can be skipped if the category isn't understood). The
value cannot exceed Elf_Word.
- The second uleb gives the number of GOT slots* used by this GOT entry.
The value cannot exceed Elf_Word.
- The third uleb encodes the category of the GOT entry. The value cannot
exceed Elf_Word.

* Except for GOT_CAT_PADDING entries where this field gives the number of
bytes of padding (the value cannot exceed Elf_Off) not the number of GOT
slots.

A category encoding can specify multiple associated arguments. Argument
interpretation is specified by the encoding. If an encoding requires
arguments, the bytes for those follow the bytes for the second uleb in the
entry description.

Categories are:

Encoding Argument * Size (slots)
Notes
GOT_CAT_UNKNOWN none 1
Unknown area of the GOT.
GOT_CAT_PADDING none <variable>
Padding between GOT regions.
                                                                       The
size field gives the padding size in bytes not the number of GOT slots.
GOT_CAT_GOTPLT_HEADER none <target dependent> The
.got.plt header. x86_64 size = 3 slots.
GOT_CAT_GOT symbol index 1
Normal entry for a symbol.
GOT_CAT_PLTGOT symbol index 1
.got.plt Entry for a PLT reference to a symbol.
GOT_CAT_IGOTPLT symbol index 1
.igot.plt entry for an ifunc.
GOT_CAT_IGOTCANONICAL symbol index 1 GOT
entry for canonical PLT entry for non-preemptible ifunc case.
GOT_CAT_TLSDESC symbol index 2 GOT
entry for a TLSDESC slot.
GOT_CAT_TLS_GD symbol index 2 GOT
entry for a GD TLS reference.
GOT_CAT_TLS_LD none 2 GOT
entry for tls_index structure for an LD TLS reference.
GOT_CAT_TLS_IE symbol index 1 GOT
entry for a IE TLS reference.
GOT_CAT_PPC64_V2_ABI_TLSLD_GOT_OFF symbol index 1
PPC64 specific TLSLD GOT slot.

.debug_lld_plt

The .debug_lld_plt section contains a PLT description. A PLT description
begins with a generic header composed of the following 3 ulebs:

length (uleb)
- The length in bytes of this PLT description not including the length
field itself.
- This allows for padding to be added to the section, useful for purposes
such as slop for incremental linking.
- The value cannot exceed Elf_Off.

version (uleb)
- The version of this description information. Currently, 0. The value
cannot exceed Elf_Word.

type (uleb)
- The type of the PLT being described.
- This affects the interpretation of the remaining description.
- Currently, only PLT_FIXSZ_ENT(value = 0) is defined for describing PLT
sections composed of a header and N fixed size entries.
- The value cannot exceed Elf_Word; although, currently as there is only
one value specified a smaller representation is sufficient.

PLT_FIXSZ_ENT interpretation
Following the generic header is the PLT_FIXSZ_ENT description header which
is composed of the following 2 ulebs:

PLT header size (uleb)
- The size of the PLT header in bytes.
- The value cannot exceed Elf_Off.

PLT entry size (uleb)
- The size of a PLT entry.
- The value cannot exceed Elf_Word.

The PLT header size and PLT entry size are hard coded depending on the
architecture and a few security related options like -z retpolineplt,
ibt, bti. Is a generic description scheme useful?

If the new format is to describe dynamic relocations in a compact way, I
am wondering whether this has over-engineered and can achieve the design
goal.
A program doesn't typically have many GLOB_DAT, TLSDESC, and TLS GD/LD/IE relocations.

MIPS folks invented DT_MIPS_LOCAL_GOTNO and
DT_MIPS_SYMTABNO-DT_MIPS_GOTSYM, but the scheme rarely saves much space
and turns out to cause more problems with .gnu.hash
https://sourceware.org/pipermail/binutils/2019-December/109330.html

The header is then followed by list of entry descriptions.
- Each entry description is a single uleb and describes the PLT entry with
the same index.
- The value of the uleb gives the index of the associated GOT entry.
- The value cannot exceed Elf_Off.

Is disassembling .plt more convenient? The linker uses a predictable way
to generate it so its content is not that hard to parse.
It can be quick because the shape of a PLT entry is known and many bytes
can be skipped.
With this in mind, this information is just easy to infer from
R_*_JUMP_SLOT relocations.

In addition to allowing hot-patching tools to work with the GOT and PLT the
information in these sections is of use to any tool that needs to display
information on the GOT and PLT sections. For example, debuggers and binary
tools synthesize labels of the form <symbol>@plt to label the PLT sections.
The information in these sections could be used to simplify such tasks.

How is this format more suitable than existing Elf64_Rel/Elf64_Rela for
hot-patching? The GOT and PLT information can be inferred from .rela.plt
and .rela.dyn easily. The scheme appears to be more complex than the
relocation format.

Thanks for looking at this proposal.

As mentioned Sony would like LLD to optionally emit sections that describe
the GOT and PLT.

The proposed binary format of these sections is as follows:

.debug_lld_got

The .debug_lld_got section contains a GOT description. The GOT description
begins with a header composed of the following fields:

length (uleb)

  • The length in bytes of the GOT description not including the length field
    itself.
  • This allows for padding to be added to the section, useful for purposes
    such as slop for incremental linking.

I am dubivious whether people will find incremental linking useful:)
https://news.ycombinator.com/item?id=26233244 from Rui Ueyama
and
https://sourceware.org/pipermail/binutils/2021-September/117828.html
from Cary Coutant:
“Do you think you’d ever want incremental linking on powerpc? Frankly,
the effort for just the one target platform was pretty high, the
maintenance on it is burdensome, and I’m tempted to deprecate it and
rip it out at some point in the future.”

I generally tend to agree w.r.t incremental linking. However, supporting the ability to include extra space in a section could have many uses and therefore I think that it is something that section formats should support as long as it is cheap to do so. Having said that we don’t actually have a need right now for this so I’m happy to drop it from the specification.

  • The value cannot exceed Elf_Off.

version (uleb)

  • The version of the description information.
  • Currently, 0.
  • The value cannot exceed Elf_Word.

The header is then followed by list of entry descriptions.
Each entry description describes the GOT entry with the same index.
Each entry description starts with three ulebs:

  • The first uleb gives the number of ulebs used by this description (so
    that the description can be skipped if the category isn’t understood). The
    value cannot exceed Elf_Word.
  • The second uleb gives the number of GOT slots* used by this GOT entry.
    The value cannot exceed Elf_Word.
  • The third uleb encodes the category of the GOT entry. The value cannot
    exceed Elf_Word.
  • Except for GOT_CAT_PADDING entries where this field gives the number of
    bytes of padding (the value cannot exceed Elf_Off) not the number of GOT
    slots.

A category encoding can specify multiple associated arguments. Argument
interpretation is specified by the encoding. If an encoding requires
arguments, the bytes for those follow the bytes for the second uleb in the
entry description.

Categories are:

Encoding Argument * Size (slots)
Notes
GOT_CAT_UNKNOWN none 1
Unknown area of the GOT.
GOT_CAT_PADDING none
Padding between GOT regions.
The
size field gives the padding size in bytes not the number of GOT slots.
GOT_CAT_GOTPLT_HEADER none The
.got.plt header. x86_64 size = 3 slots.
GOT_CAT_GOT symbol index 1
Normal entry for a symbol.
GOT_CAT_PLTGOT symbol index 1
.got.plt Entry for a PLT reference to a symbol.
GOT_CAT_IGOTPLT symbol index 1
.igot.plt entry for an ifunc.
GOT_CAT_IGOTCANONICAL symbol index 1 GOT
entry for canonical PLT entry for non-preemptible ifunc case.
GOT_CAT_TLSDESC symbol index 2 GOT
entry for a TLSDESC slot.
GOT_CAT_TLS_GD symbol index 2 GOT
entry for a GD TLS reference.
GOT_CAT_TLS_LD none 2 GOT
entry for tls_index structure for an LD TLS reference.
GOT_CAT_TLS_IE symbol index 1 GOT
entry for a IE TLS reference.
GOT_CAT_PPC64_V2_ABI_TLSLD_GOT_OFF symbol index 1
PPC64 specific TLSLD GOT slot.

.debug_lld_plt

The .debug_lld_plt section contains a PLT description. A PLT description
begins with a generic header composed of the following 3 ulebs:

length (uleb)

  • The length in bytes of this PLT description not including the length
    field itself.
  • This allows for padding to be added to the section, useful for purposes
    such as slop for incremental linking.
  • The value cannot exceed Elf_Off.

version (uleb)

  • The version of this description information. Currently, 0. The value
    cannot exceed Elf_Word.

type (uleb)

  • The type of the PLT being described.
  • This affects the interpretation of the remaining description.
  • Currently, only PLT_FIXSZ_ENT(value = 0) is defined for describing PLT
    sections composed of a header and N fixed size entries.
  • The value cannot exceed Elf_Word; although, currently as there is only
    one value specified a smaller representation is sufficient.

PLT_FIXSZ_ENT interpretation
Following the generic header is the PLT_FIXSZ_ENT description header which
is composed of the following 2 ulebs:

PLT header size (uleb)

  • The size of the PLT header in bytes.
  • The value cannot exceed Elf_Off.

PLT entry size (uleb)

  • The size of a PLT entry.
  • The value cannot exceed Elf_Word.

The PLT header size and PLT entry size are hard coded depending on the
architecture and a few security related options like -z retpolineplt,
ibt, bti. Is a generic description scheme useful?

It’s useful because the description is emitted by the linker rather than requiring the consuming tools to be adapted to the linker’s output. For example, llvm-objdump can generate @plt labels for PLT entries when disassembling but this doesn’t work if -z retpolineplt is used as the code doesn’t support that newer type of PLT (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp#L494).

If the new format is to describe dynamic relocations in a compact way, I
am wondering whether this has over-engineered and can achieve the design
goal.
A program doesn’t typically have many GLOB_DAT, TLSDESC, and TLS GD/LD/IE relocations.

The purpose is to describe the GOT/PLT in a consistent and simple manner for consuming tools. Over the years there have been a number of changes to how the GOT is optimised. GOT entries can be patched statically, patched with relocations that don’t reference dynamic symbols, or patched with relocations that reference a dynamic symbol etc… using this section allows each GOT entry to be consistently described. If we can design a more compact format for the same information that would be great.

MIPS folks invented DT_MIPS_LOCAL_GOTNO and
DT_MIPS_SYMTABNO-DT_MIPS_GOTSYM, but the scheme rarely saves much space
and turns out to cause more problems with .gnu.hash
https://sourceware.org/pipermail/binutils/2019-December/109330.html

The .debug_lld_got section doesn’t currently handle the MIPS GOT as it is much more complicated than other GOTs and there already seemed to be code in place to be able to parse and dump it.

The header is then followed by list of entry descriptions.

  • Each entry description is a single uleb and describes the PLT entry with
    the same index.
  • The value of the uleb gives the index of the associated GOT entry.
  • The value cannot exceed Elf_Off.

Is disassembling .plt more convenient? The linker uses a predictable way
to generate it so its content is not that hard to parse.
It can be quick because the shape of a PLT entry is known and many bytes
can be skipped.
With this in mind, this information is just easy to infer from
R_*_JUMP_SLOT relocations.

In addition to allowing hot-patching tools to work with the GOT and PLT the
information in these sections is of use to any tool that needs to display
information on the GOT and PLT sections. For example, debuggers and binary
tools synthesize labels of the form @plt to label the PLT sections.
The information in these sections could be used to simplify such tasks.

How is this format more suitable than existing Elf64_Rel/Elf64_Rela for
hot-patching? The GOT and PLT information can be inferred from .rela.plt
and .rela.dyn easily. The scheme appears to be more complex than the
relocation format.

It’s a scheme that describes the GOT and PLT without the consumer needing knowledge of other aspects of the dynamic file format such as the dynamic relocations and symbols. Referencing static symbols directly avoids any ambiguity as to which references caused a GOT entry to be created (matching by address may find multiple aliases).

Thanks for commenting.

Related to naming, is there a chance that other linkers might adopt this feature as well? If so, maybe we should avoid including “lld” in the name and use a more generic name like .debug_linker_got and .debug_linker_plt?

Yeah, mixed feelings - using lld/llvm/something ensures we don’t collide with someone else’s ideas, but may reduce the possibility of uptake elsewhere. I’d usually err on a non-colliding name at first, and generalize if there’s interest, but it’s possible the non-colliding name just encourages other people to go make there own thing before anyone has a chance to generalize it.

I would rather keep it focused on LLD until there is interest from outside. This allows us complete control over the specification without having to consider other toolchains requirements. However, the section have been designed to be useful for other linker’s to emit. If the concept is generalised and picked up by other toolchains then that’s great - and we could look at moving to the generalised scheme in the future.

(minor quibble: I’d probably avoid using the “.debug_*” namespace for things that seem pretty separate from/not a clear extension to DWARF - but maybe there’s precedent for this? Not sure)

Using a .debug_ prefix is useful to because although these sections are not DWARF we want the binary tools to handle them with the same rules that they would apply to DWARF sections. Having said that, the binary tools can be modified to use the sections types, so can drop the .debug_ prefix if you object.

Interestingly, I do think that there is merit in extending DWARF to include sections emitted by the linker rather than the compiler. For example, it would be great if the DWARF standard included sections for describing the --gc-sections and --icf optimisations that the linker may apply.

Thanks for looking at this proposal.

>As mentioned Sony would like LLD to optionally emit sections that describe
>the GOT and PLT.
>
>The proposed binary format of these sections is as follows:
>
>.debug_lld_got
>==============
>
>The .debug_lld_got section contains a GOT description. The GOT description
>begins with a header composed of the following fields:
>
>length (uleb)
>- The length in bytes of the GOT description not including the length
field
>itself.
>- This allows for padding to be added to the section, useful for purposes
>such as slop for incremental linking.

I am dubivious whether people will find incremental linking useful:)
Mold: A Modern Linker | Hacker News from Rui Ueyama
and
[GOLD] Got_offset_list: addend field
from Cary Coutant:
"Do you think you'd ever want incremental linking on powerpc? Frankly,
the effort for just the one target platform was pretty high, the
maintenance on it is burdensome, and I'm tempted to deprecate it and
rip it out at some point in the future."

I generally tend to agree w.r.t incremental linking. However, supporting
the ability to include extra space in a section could have many uses and
therefore I think that it is something that section formats should support
as long as it is cheap to do so. Having said that we don't actually have a
need right now for this so I'm happy to drop it from the specification.

>- The value cannot exceed Elf_Off.
>
>version (uleb)
>- The version of the description information.
>- Currently, 0.
>- The value cannot exceed Elf_Word.
>
>The header is then followed by list of entry descriptions.
>Each entry description describes the GOT entry with the same index.
>Each entry description starts with three ulebs:
>
>- The first uleb gives the number of ulebs used by this description (so
>that the description can be skipped if the category isn't understood). The
>value cannot exceed Elf_Word.
>- The second uleb gives the number of GOT slots* used by this GOT entry.
>The value cannot exceed Elf_Word.
>- The third uleb encodes the category of the GOT entry. The value cannot
>exceed Elf_Word.
>
>* Except for GOT_CAT_PADDING entries where this field gives the number of
>bytes of padding (the value cannot exceed Elf_Off) not the number of GOT
>slots.
>
>A category encoding can specify multiple associated arguments. Argument
>interpretation is specified by the encoding. If an encoding requires
>arguments, the bytes for those follow the bytes for the second uleb in the
>entry description.
>
>Categories are:
>
>Encoding Argument * Size (slots)
> Notes
>GOT_CAT_UNKNOWN none 1
>Unknown area of the GOT.
>GOT_CAT_PADDING none <variable>
> Padding between GOT regions.
>
The
>size field gives the padding size in bytes not the number of GOT slots.
>GOT_CAT_GOTPLT_HEADER none <target dependent>
The
>.got.plt header. x86_64 size = 3 slots.
>GOT_CAT_GOT symbol index 1
>Normal entry for a symbol.
>GOT_CAT_PLTGOT symbol index 1
>.got.plt Entry for a PLT reference to a symbol.
>GOT_CAT_IGOTPLT symbol index 1
>.igot.plt entry for an ifunc.
>GOT_CAT_IGOTCANONICAL symbol index 1
GOT
>entry for canonical PLT entry for non-preemptible ifunc case.
>GOT_CAT_TLSDESC symbol index 2
GOT
>entry for a TLSDESC slot.
>GOT_CAT_TLS_GD symbol index 2
GOT
>entry for a GD TLS reference.
>GOT_CAT_TLS_LD none 2
GOT
>entry for tls_index structure for an LD TLS reference.
>GOT_CAT_TLS_IE symbol index 1
GOT
>entry for a IE TLS reference.
>GOT_CAT_PPC64_V2_ABI_TLSLD_GOT_OFF symbol index 1
>PPC64 specific TLSLD GOT slot.
>
>.debug_lld_plt
>==============
>
>The .debug_lld_plt section contains a PLT description. A PLT description
>begins with a generic header composed of the following 3 ulebs:
>
>length (uleb)
>- The length in bytes of this PLT description not including the length
>field itself.
>- This allows for padding to be added to the section, useful for purposes
>such as slop for incremental linking.
>- The value cannot exceed Elf_Off.
>
>version (uleb)
>- The version of this description information. Currently, 0. The value
>cannot exceed Elf_Word.
>
>type (uleb)
>- The type of the PLT being described.
>- This affects the interpretation of the remaining description.
>- Currently, only PLT_FIXSZ_ENT(value = 0) is defined for describing PLT
>sections composed of a header and N fixed size entries.
>- The value cannot exceed Elf_Word; although, currently as there is only
>one value specified a smaller representation is sufficient.
>
>PLT_FIXSZ_ENT interpretation
>Following the generic header is the PLT_FIXSZ_ENT description header which
>is composed of the following 2 ulebs:
>
>PLT header size (uleb)
>- The size of the PLT header in bytes.
>- The value cannot exceed Elf_Off.

>PLT entry size (uleb)
>- The size of a PLT entry.
>- The value cannot exceed Elf_Word.

The PLT header size and PLT entry size are hard coded depending on the
architecture and a few security related options like -z retpolineplt,
ibt, bti. Is a generic description scheme useful?

It's useful because the description is emitted by the linker rather than
requiring the consuming tools to be adapted to the linker's output. For
example, llvm-objdump can generate <symbol>@plt labels for PLT entries when
disassembling but this doesn't work if -z retpolineplt is used as the code
doesn't support that newer type of PLT (
llvm-project/X86MCTargetDesc.cpp at main · llvm/llvm-project · GitHub
).

I am concerned that this would add a significant complexity to LLD.

Except canonical PLT entries (normal function and STT_GNU_IFUNC
converted STT_FUNC), PLT entries have insignificant addresses and the
linker can generate multiple instances.
For example, the PowerPC64 port PLT is coupled with range extension
thunks and there can be multiple instances.
Each architecture's PLT may have a different shape.
I am not sure how a generic format can describe a stub.
Some architectures can do micro optimization like: if we know the hi
part of a pair of hi/lo values is zero, we may save one instruction.
Such choice is easy to represent in code but difficult to describe
in a serialized format.

AArch64's BTI PLT is also interesting: some PLT entries may have a
leading `bti c` while some don't.

x86-64's IBT PLT is worse: there are two sections: .plt and .plt.sec .
How to describe it?
(Multiple folks were against .plt.sec ; I subscribed to x86-64-abi after
this event in case I missed such over-engineering designs in the
future.)

Describing PLT/GOT gives me a sense like support GNU ld --verbose style
linker script dump (51309 – `--verbose` should generate and dump out a linker script based on built-in rules).
Yes, it can make some applications happy but the implementation complexity
would be huge.

Perhaps something I really want to ask is whether we ran into an XY
problem (https://xyproblem.info/). What did the hot-patching feature
actually need? FWIW such a feature is also implemented in the Linux
kernel, called live-patching, which is related to dynamic ftrace.
So far we haven't heard that they need anything from the linker side.

Well, a GNU contributor added -z unique-symbol very quickly while the
needs appear to have disappeared :slight_smile:
https://bugs.llvm.org/show_bug.cgi?id=50745
I am sold that this option is misdesigned :slight_smile:
(Explain GNU style linker options | MaskRay)

If the new format is to describe dynamic relocations in a compact way, I
am wondering whether this has over-engineered and can achieve the design
goal.
A program doesn't typically have many GLOB_DAT, TLSDESC, and TLS GD/LD/IE
relocations.

The purpose is to describe the GOT/PLT in a consistent and simple manner
for consuming tools. Over the years there have been a number of changes to
how the GOT is optimised. GOT entries can be patched statically, patched
with relocations that don't reference dynamic symbols, or patched with
relocations that reference a dynamic symbol etc.. using this section allows
each GOT entry to be consistently described. If we can design a more
compact format for the same information that would be great.

Does --emit-relocs help here?

It is very important to resist features that add needless complexity :slight_smile:

I will reply to the (great) points you have raised tomorrow. The hot-patching feature is proprietary technology and I need to check how much I can disclose about it - sorry! I will also put up a prototype implementation so that the complexity of the implementation can be judged. I have not attempted to describe all GOT/PLTs only the ones that are structured “normally”. x86-64’s IBT PLT would need an extension to the binary format to describe. I am not convinced we need to describe every variation to add value. If the binary format can describe the commonly used GOT/PLT structures then I believe that is sufficient. We can design the binary format to be flexible so that it can be extended in the future if support for a GOT/PLT structure that cannot be described currently is required.

Do you have an opinion on the other sections? In particular the linkmap section? That section is the most important information for our hot-patching implementation and it also has clear benefits over the current -Map file option.

Thanks.

It is very important to resist features that add needless complexity :slight_smile:

Thanks for understanding :slight_smile:

But sorry for some pushback below.

I will reply to the (great) points you have raised tomorrow. The
hot-patching feature is proprietary technology and I need to check how much
I can disclose about it - sorry! I will also put up a prototype
implementation so that the complexity of the implementation can be judged.
I have not attempted to describe all GOT/PLTs only the ones that are
structured "normally". x86-64's IBT PLT would need an extension to the
binary format to describe. I am not convinced we need to describe every
variation to add value. If the binary format can describe the commonly used
GOT/PLT structures then I believe that is sufficient. We can design the
binary format to be flexible so that it can be extended in the future if
support for a GOT/PLT structure that cannot be described currently is
required.

Do you have an opinion on the other sections? In particular the linkmap
section? That section is the most important information for our
hot-patching implementation and it also has clear benefits over the current
-Map file option.

For "A section which specifies the list of wrapped symbols.", I believe
the symbol table should be the source of truth for --wrap results. It is
just easier to inspect the symbol table than parsing a serialized file
with encoding some properties of the wrapped symbols. For --wrap=foo, we
have foo, __wrap_foo, __real_foo, how do the serialized format describe
their properties better than the symbol table itself?
For this one, I understand that the information can probably make the
propritetary technology easier but it can be difficult to maintain and
by upstreaming you may end up incurring more overhead if other
contributors need to alter the format for legitimate reasons.

For "A linkmap section that contains a subset of the information
contained in a linker -Map file.", that looks like duplicated
information. Can your users parse the -Map output and synthesize the
needed section?

If the feedback is that "parsing -Map output is fragile", then you'd
need justifying reasons for a new format.
For example, the --dependency-file feature, while potentially useful, is
not sufficiently orthogonal, so it got lots of pushback
(⚙ D82437 [ELF] Add --dependency-file option). In the end it was accepted because
the sufficient usefulness was demonstrated (it supports the venerable
make and ninja, which have very wide adoption) and as a bonus: GNU
ld/gold added it as well. Well, the GNU linkers' feature parity is
certainly not a necessity: but their efforts increased accessibility and
may reach out to more users. As a comparison, I am concerned that some
sections you may mention may be restricted to benefit the few who are
using the proprietary technology.

For many sections, they really need to be discussed case by case. As
another example, I added --why-extract a few days ago. It is something
GNU ld's -Map already describes. It is sufficiently useful and LLD's
existing features did not cover it. It is sufficiently useful, so I
think a dedicated option is more appropriate.

GNU ld -Map has other output like "Merging program properties" which may
be less useful. But if the value supporting them is sufficient, we can
add them as well.

It is very important to resist features that add needless complexity :slight_smile:

Thanks for understanding :slight_smile:

But sorry for some pushback below.

I will reply to the (great) points you have raised tomorrow. The
hot-patching feature is proprietary technology and I need to check how much
I can disclose about it - sorry!

Rather than discussing our proprietary technology I found the following documents explaining ELF hot-patching implementations:

https://www.cs.dartmouth.edu/~sws/pubs/rbls10.pdf

https://github.com/cloudlinux/libcare/blob/master/docs/internals.rst

From those descriptions it should be evident why a mapping from input sections to address regions and a description of the GOT and PLT are useful for a hot-patching framework.

I will also put up a prototype

implementation so that the complexity of the implementation can be judged.
I have not attempted to describe all GOT/PLTs only the ones that are
structured “normally”. x86-64’s IBT PLT would need an extension to the
binary format to describe. I am not convinced we need to describe every
variation to add value. If the binary format can describe the commonly used
GOT/PLT structures then I believe that is sufficient. We can design the
binary format to be flexible so that it can be extended in the future if
support for a GOT/PLT structure that cannot be described currently is
required.

Do you have an opinion on the other sections? In particular the linkmap
section? That section is the most important information for our
hot-patching implementation and it also has clear benefits over the current
-Map file option.

For “A section which specifies the list of wrapped symbols.”, I believe
the symbol table should be the source of truth for --wrap results. It is
just easier to inspect the symbol table than parsing a serialized file
with encoding some properties of the wrapped symbols. For --wrap=foo, we
have foo, __wrap_foo, __real_foo, how do the serialized format describe
their properties better than the symbol table itself?
For this one, I understand that the information can probably make the
propritetary technology easier but it can be difficult to maintain and
by upstreaming you may end up incurring more overhead if other
contributors need to alter the format for legitimate reasons.

There is nothing recorded in the output symbol table for symbols that weren’t wrapped because there was no definition. In other words, the wrap section records which symbols the linker was directed to wrap, the symtab records which symbols were wrapped. You make a good point that the overhead may be lower if we keep the section downstream. I will evaluate that.

For “A linkmap section that contains a subset of the information
contained in a linker -Map file.”, that looks like duplicated
information. Can your users parse the -Map output and synthesize the
needed section?

If the feedback is that “parsing -Map output is fragile”, then you’d
need justifying reasons for a new format.
For example, the --dependency-file feature, while potentially useful, is
not sufficiently orthogonal, so it got lots of pushback
(https://reviews.llvm.org/D82437). In the end it was accepted because
the sufficient usefulness was demonstrated (it supports the venerable
make and ninja, which have very wide adoption) and as a bonus: GNU
ld/gold added it as well. Well, the GNU linkers’ feature parity is
certainly not a necessity: but their efforts increased accessibility and
may reach out to more users. As a comparison, I am concerned that some
sections you may mention may be restricted to benefit the few who are
using the proprietary technology.

Parsing the -Map file is not a substitute for the linkmap section. The linkmap section is much faster to generate and process than generating and then parsing the -Map file would be. Just generating the -Map file took 8.441 s vs only 3.910 s to generate the linkmap section when linking chrome. Performance is important because users who need hot-patching want to be able to hot-patch any ELF, they don’t want to have to re-link to generate an ELF that can be hot-patched. The speed and workflow improvements of the linkmap section over the -Map file mean that it is generally useful and not just useful for hot-patching tools (I detailed these advantages upthread).

Sorry for double posting - I have now established what I can disclose about our hot-patching implementation.

Our hot-patching implementation recompiles a file and then replaces the relevant parts of the running image with the new compilation while performing relocations in the same way as the linker.

This approach has some important advantages vs. relinking the whole image and patching in the differences.

  • Linking the whole image can take a long time and certainly longer than a single file compilation.
  • Calculating the difference between images is non-trivial. We have to consider debug data as well as executable sections, rodata, etc.
  • Simpler to control what gets applied when multiple source file have changed and not all changes are required.

The proposed sections provide the information necessary to understand how a given object file should be mapped into memory and to apply relocations.