[RFC] Compress arbitrary sections with ld.lld --compress-sections

MaskRay · June 29, 2023, 3:08am

ELF linkers GNU ld, gold, and ld.lld provide --compress-debug-sections=[zlib|zstd] to compress .debug_* sections.
This functionality can be extended to arbitrary sections. I have developed a prototype for lld
[RFC][ELF] Add --compress-ections by MaskRay · Pull Request #1 · MaskRay/llvm-project · GitHub with ~40 lines of code and filed a GNU ld feature request back in 2021 (27452 – ld: Support compressing arbitrary sections (generalized --compress-debug-sections=)).

ld.lld --compress-sections <sections-glob>=[zlib|zstd]

In recent years, metadata sections have gained more uses.
Some designers may want to implement compression within the format.
However, if multiple metadata sections adopt this approach, it would lead to duplicated code, considering that we have a generic feature at the object file format level.
For example, if we have --compress-debug, we can straightforwardly support the compression proposal for the code coverage section __llvm_covmap.
See Consider using mergeable section for filenames in coverage mapping · Issue #48499 · llvm/llvm-project · GitHub

One question that naturally arises is what should be done if <section-glob> matches a SHF_ALLOC section.
In my analysis, I find that the linker can simply ignore the distinction between SHF_ALLOC and non-SHF_ALLOC, and the resulting behavior appears reasonable to me.

Compressed sections that are not SHF_ALLOC, similar to .debug_* sections, do not require any special treatment. On the other hand, compressed sections with the SHF_ALLOC flag will be included as part of a PT_LOAD segment. During the loading process, the dynamic loader does not necessitate any specific handling as section headers are ignored.

When the program is executed, the runtime library responsible for the metadata section will allocate memory and decompress the content of the section. For instance, the section’s content may contain references to .text, but there will be no references from .text to the decompressed section. The uncompressed metadata section may start with 4 zero bytes to be distinguished from the Elf{32,64}_Chdr header with a non-zero ch_type.

However, I have concerns regarding non-compliance with the ELF standard due to the incompatibility of SHF_ALLOC|SHF_COMPRESSED sections, as stated in the current generic ABI documentation (Sections):

SHF_COMPRESSED - This flag identifies a section containing compressed data. SHF_COMPRESSED applies only to non-allocable sections, and cannot be used in conjunction with SHF_ALLOC. In addition, SHF_COMPRESSED cannot be applied to sections of type SHT_NOBITS.

Therefore, I made a generic-abi proposal to remove the SHF_ALLOC incompatibility from the wording: https://groups.google.com/g/generic-abi/c/HUVhliUrTG0

I have some replies to questions raised on the generic-abi thread. Rephrased below:

Q: Is it wasteful to have both the compressed and uncompressed copies in memory at runtime?

The tradeoff between compressed debug sections and using SHF_ALLOC|SHF_COMPRESSED is quite similar.
When a symbolizer or debugger loads the compressed debug information, it needs to allocate a memory chunk to hold the decompressed content instead of memory mapping the content from the disk.

Why do some people accept this tradeoff? Well, they may prioritize file size and consider debugging as an infrequent operation, or they simply accept this inefficiency.

I understand that SHF_ALLOC|SHF_COMPRESSED sections create an additional copy in the memory image, which can be seen as wasteful.
However, this portion is read-only and accessed on-demand. It’s not significantly different from when a program has an internal symbolizer that performs introspection (opening itself, parsing section headers, finding debug sections); sanitizers support such an internal symbolizer.

Q: Why not switch to non-ALLOC SHF_COMPRESSED?

I believe that SHF_ALLOC has two primary use cases:

replace runtime introspection (opening its own file, parsing section headers, parsing section content) with inspecting the content between the encapsulation symbols
prevent strip/llvm-strip from stripping the sections

Q: Is there any restriction for SHF_ALLOC|SHF_COMPRESSED sections?

Runtime library’s decompression and “relocation” operation imposes certain limitations on use cases. For instance, it would not be possible to define a symbol relative to an input section if its absolute address is significant, as the “relocation” of the section would nullify the absolute address. However, label differences within the output section would still be permissible.

In my prototype, I try to compute the output section size once, expect that it does not change, and give an error if it does change due to certain linker script constructs.

Q: What should PE/COFF, Mach-O, wasm, XCOFF do?

I wish that they have a generic compression feature as well:) Based on my observation, there are ELF users who have exceptionally large executables and prioritize compression. In the long term, I wish that object file format vendors who have users caring about compression provide compression at the object file format level, not add more compression code to various compiler instrumentation features.

smithp35 · June 29, 2023, 3:02pm

I’ll start with my experience of a toolchain that does support compression of RW sections. In an embedded system where the program executed directly from read-only flash, with the initial contents of RW data copied from flash (LMA) to RAM (VMA). When the size of the compressed data + size of the decompressor is < uncompressed data size it can be worth compressing the data.

We don’t do this via any section flag, and it only works via collusion between the linker and library to embed all the information for it to do the decompression.

It does add significant complexity to the linker for a number of reasons:

RW data can contain pointers so you can’t do compression until all addresses are fixed.
Load addresses following compression aren’t stable until after compression has run, which can lead to complications when there are load address calculations that depend on the size of the compressed data.
Linker defined symbols that are dependent on the LMA of the compressed data need to be calculated post compression.

We did end up supporting in-place compression, where the compressed data is copied to a buffer and then decompressed over the original data. That is really only useful for benchmarking on a model as I don’t think it saves anything.

Q: Is it wasteful to have both the compressed and uncompressed copies in memory at runtime?
In an embedded system it is required to have compressed data in non-volatile memory (flash/ROM). This isn’t wasteful.

I can see it being wasteful if the compressed data is coming from a different source, for example an ELF file.

Q. Why not switch to non-ALLOC SHF_COMPRESSED?
I’m not sure I understand the question. SHF_ALLOC to me means the section content is a necessary part of the running program, i.e. it is assigned an address and can be accessed by at least a subset of the runtime of the program.

I think using SHF_ALLOC to protect from strip removing them is more of a side-effect than intention. Personally would prefer we had options that strip could read to say don’t remove me! For non SHF_ALLOC sections that we don’t want removed.

Q: Is there any restriction for SHF_ALLOC|SHF_COMPRESSED sections?
I think it would be better to state this as “What use cases must be supported?” Rather than these use cases are not supported. I think the only fundamental restriction is where we have a circular dependency.

For our implementation we chose to detect cases we couldn’t support and turn off compression for that section.

I can certainly see the benefit of keeping things as simple as possible. This functionality exposes the linker to a whole host of corner cases and opportunities for subtle bugs.

FWIW our thinking for an open source toolchain for embedded systems is to do compression as a 2 stage link. Very much like the existing linux kernel loader.

dblaikie · June 29, 2023, 6:50pm

The uncompressed metadata section may start with 4 zero bytes to be distinguished from the Elf{32,64}_Chdr header with a non-zero ch_type.

Ah, that seems like a fairly unfortunate conflict/non-orthogonality. To me that’s sort of enough to question whether SHF_ALLOC|SHF_COMPRESSED is reasonable/valid. If the consumer can’t treat it as fully arbitrary compression, it looses a lot of its value compared to a custom compression scheme, I think? I guess it generalizes the linker support at least so you don’t have to keep adding weird special cases to the linker. But the contents not being able to be arbitrary is a pretty major conflict, imho.

Not sure what to do about that, though - maybe it’s a price worth paying for the generality.

If things can be non-SHF_ALLOC while still, perhaps, being preserved by strip, would that be better? Or if they’re SHF_ALLOC but unusable from the mapping if they’re compressed (so you accept that these sections aren’t actually usable in their mapping - you have to go read them from disk like a non-SHF_ALLOC section) maybe that’s a better tradeoff than “this isn’t really a generic/non-domain-specific compression scheme anymore”?

tschuett · June 29, 2023, 8:35pm

Disclaimer: I have no ideas about the internals, but I am happy to use lld. There was chat about a post-linker tool for compression to offload the complexity from lld.

MaskRay · June 29, 2023, 10:32pm

Thank you for sharing the insights!

smithp35:

I’ll start with my experience of a toolchain that does support compression of RW sections. In an embedded system where the program executed directly from read-only flash, with the initial contents of RW data copied from flash (LMA) to RAM (VMA). When the size of the compressed data + size of the decompressor is < uncompressed data size it can be worth compressing the data.

We don’t do this via any section flag, and it only works via collusion between the linker and library to embed all the information for it to do the decompression.

It does add significant complexity to the linker for a number of reasons:

RW data can contain pointers so you can’t do compression until all addresses are fixed.

Load addresses following compression aren’t stable until after compression has run, which can lead to complications when there are load address calculations that depend on the size of the compressed data.

Linker defined symbols that are dependent on the LMA of the compressed data need to be calculated post compression.

We did end up supporting in-place compression, where the compressed data is copied to a buffer and then decompressed over the original data. That is really only useful for benchmarking on a model as I don’t think it saves anything.

Curious, are there any ALLOC sections used with the compression feature?

I agree that protection from strip can be seen as a secondary benefit. I think it is valuable in practice, as linking and stripping are separate steps, and the developers may have less control on the strip side.
For example, adding unconditional or conditional --keep-section in the build system can be complex or appear strange.

In the GNU world, I think distributions have some strip option requirement/restriction. Therefore, Using section flags to indicate stripable or persistent sections even discussed whether we want a section flag to avoid stripping.

I think some SHF_ALLOC metadata sections could be changed to non-SHF_ALLOC, but that would require runtime introspection (opening its open file and parsing section headers), which would be infeasible due to stripping. When dynamic linking is involved, collecting all the shared object dependencies may be non-trivial at runtime as well.

By a circular dependency, do you mean that a SHF_ALLOC section needs to access the SHF_ALLOC|SHF_COMPRESSED section? A circle formulates as the SHF_ALLOC|SHF_COMPRESSED needs to describe SHF_ALLOC code or data.

Curious what the 2 stage link is.
Perform a relocatable link on the metadata sections, apply compression so that the size is fixed, then link the compressed section into the rest of the program?

MaskRay · June 29, 2023, 10:47pm

Linker compression offers significant benefits by allowing compression to be applied to the entire output section.
Internal compression formats within metadata sections, unknown to the linker, can result in multiple compressed streams without shared state, negatively impacting compression ratios.

Consider a scenario where a program consists of 1000 .o files with small metadata sections that do not benefit from compression individually.
However, when these 1000 files are concatenated, the resulting metadata section may become large enough to benefit from compression.

In practice, the uncompressed header has a lot of choices, not just a uint32_t 0`.
When we added ELFCOMPRESS_ZSTD, https://groups.google.com/g/generic-abi/c/satyPkuMisk/m/xRqMj8M3AwAJ we acknowledged that we did not intend to include a plethora of formats.
Even with the allocation of 4 values for potential extensions, there are still numerous values available for allocation.

While there is a slight risk of collision, it is not a significant cause for concern. The metadata can change its header. Metadata sections generally faces fewer backward compatibility restrictions since prebuilt libraries with specific instrumentation are considered awkward and threfore uncommon.

Name    Value
ELFCOMPRESS_ZLIB        1
ELFCOMPRESS_ZSTD        2
ELFCOMPRESS_LOOS        0x60000000
ELFCOMPRESS_HIOS        0x6fffffff
ELFCOMPRESS_LOPROC      0x70000000
ELFCOMPRESS_HIPROC      0x7fffffff

I have a previous reply that talks about strip.

smithp35 · June 30, 2023, 8:56am

Curious, are there any ALLOC sections used with the compression feature?

Yes, it implicitly includes all Read-Write SHF_ALLOC, SHT_PROGBITS sections. Although it can be enabled/disabled at the equivalent of the OutputSection. (Documentation – Arm Developer)

In principle it could work on executable and read-only sections too, but for the embedded use case these sections tend to be executed in place rather than copied/decompressed to RAM.

By a circular dependency, do you mean that a SHF_ALLOC section needs to access the SHF_ALLOC|SHF_COMPRESSED section? A circle formulates as the SHF_ALLOC|SHF_COMPRESSED needs to describe SHF_ALLOC code or data.

To give a concrete example:

CompressibleSection1

Contents contain load address of compressible section 2 e.g LOADADDR(CompressibleSection2)
CompressibleSection2
Contents contain load address of compressible section 1 e.g LOADADDR(CompressibleSection1)

The load address is only known after compression, I’m assuming a compression algorithm that may change contents and size depending on the value of LOADADDR(CompressibleSection*).

This is somewhat contrived and I don’t think I’ve seen it happen in practice, but does need guarding against.

Curious what the 2 stage link is.
Perform a relocatable link on the metadata sections, apply compression so that the size is fixed, then link the compressed section into the rest of the program?

I think I expressed it poorly. It is more like 2 separate links, althouh in principle these could be integrated into one tool.

Step 1: Link as normal to produce an ELF file
Step 2: Extract the ELF file program segments and compress the RW segment, wrap in an ELF file as sections.
Step 3: Link a small decompressor/loader along with the binary program segments, including the compressed RW segment.

This is all independent of the linker, the step 3 link treats the compressed data as a binary blob, likely embedded in an ELF section.

I think this would work well for simple embedded programs, would need a lot more work to integrate with a dynamic loader.

MaskRay · July 6, 2023, 7:18pm

Posted ⚙ D154641 [ELF] Add --compress-ections

MaskRay · March 13, 2024, 6:13am

A variant of D154641 that only applies to non-SHF_ALLOC sections landed as [ELF] Add --compress-section to compress matched non-SHF_ALLOC sections by MaskRay · Pull Request #84855 · llvm/llvm-project · GitHub . It can be used to compress .strtab, which is often large ([RFC] Compressed SHT_SYMTAB/SHT_STRTAB for ELF).

SHF_ALLOC compression seems to have some value as well, but it’s difficult to handle certain constructs like range extension thunks: https://reviews.llvm.org/D154641#4481191.

MaskRay · March 27, 2024, 4:27am

I have started a thread Llvm-objcopy --compress-sections about llvm-objcopy --compress-sections.

Topic		Replies	Views
Llvm-objcopy --compress-sections LLVM Project llvm	0	185	March 27, 2024
[LLD] Should --compress_debug_sections be enabled (=zlib) by default ? LLVM Dev List Archives	4	235	May 8, 2019
[RFC] Compressed SHT_SYMTAB/SHT_STRTAB for ELF LLVM Project llvm	5	373	July 22, 2024
RFC: Using zlib to decompress debug info sections. LLVM Dev List Archives	15	166	May 7, 2013
How does clang implement compression during debugging? Clang Frontend debuginfo , clang , llvm	1	351	November 18, 2022

[RFC] Compress arbitrary sections with ld.lld --compress-sections

Related topics