[RFC] Compress arbitrary sections with ld.lld --compress-sections

I’ll start with my experience of a toolchain that does support compression of RW sections. In an embedded system where the program executed directly from read-only flash, with the initial contents of RW data copied from flash (LMA) to RAM (VMA). When the size of the compressed data + size of the decompressor is < uncompressed data size it can be worth compressing the data.

We don’t do this via any section flag, and it only works via collusion between the linker and library to embed all the information for it to do the decompression.

It does add significant complexity to the linker for a number of reasons:

  • RW data can contain pointers so you can’t do compression until all addresses are fixed.
  • Load addresses following compression aren’t stable until after compression has run, which can lead to complications when there are load address calculations that depend on the size of the compressed data.
  • Linker defined symbols that are dependent on the LMA of the compressed data need to be calculated post compression.

We did end up supporting in-place compression, where the compressed data is copied to a buffer and then decompressed over the original data. That is really only useful for benchmarking on a model as I don’t think it saves anything.

Q: Is it wasteful to have both the compressed and uncompressed copies in memory at runtime?
In an embedded system it is required to have compressed data in non-volatile memory (flash/ROM). This isn’t wasteful.

I can see it being wasteful if the compressed data is coming from a different source, for example an ELF file.

Q. Why not switch to non-ALLOC SHF_COMPRESSED?
I’m not sure I understand the question. SHF_ALLOC to me means the section content is a necessary part of the running program, i.e. it is assigned an address and can be accessed by at least a subset of the runtime of the program.

I think using SHF_ALLOC to protect from strip removing them is more of a side-effect than intention. Personally would prefer we had options that strip could read to say don’t remove me! For non SHF_ALLOC sections that we don’t want removed.

Q: Is there any restriction for SHF_ALLOC|SHF_COMPRESSED sections?
I think it would be better to state this as “What use cases must be supported?” Rather than these use cases are not supported. I think the only fundamental restriction is where we have a circular dependency.

For our implementation we chose to detect cases we couldn’t support and turn off compression for that section.

I can certainly see the benefit of keeping things as simple as possible. This functionality exposes the linker to a whole host of corner cases and opportunities for subtle bugs.

FWIW our thinking for an open source toolchain for embedded systems is to do compression as a 2 stage link. Very much like the existing linux kernel loader.