Introduction
To provide a solution for automatic section packing in LLD, we’d like to implement the GNU LD flag --enable-non-contiguous-regions
. This allows input sections that would overflow memory regions to spill to a later matching output section rather than failing the link. This can be used to first-fit pack input sections into memory regions, which is useful for embedded development, since this otherwise needs to be done by hand.
Background
The matching of input sections to output sections occurs fairly early in LLD, and many later steps depend on that matching. Memory regions overflows are detected much later, when addresses are being assigned to sections and symbols. The algorithm that does this assignment is one of the only parts of the linker that uses iteration to fixed point. This is because the linker relaxations and thunk insertion both affect and are affected by precise section addresses. Section spilling would be a similar concern.
Implementation
We propose an approach similar to that in GNU LD. As a design goal, the flag should have negligible effect on linker time/space when disabled. It’s not a goal for the flag to have no behavioral effects on links where no spills actually occur; the reasons for this are discussed later.
When the flag is enabled, during input section assignment to matching input section descriptions, any matches after the first create a stub that refers to the original input section. The sequence of stubs for an input section is recorded in an ancillary map.
Example linker script
MEMORY { m1: ..., m2: ..., m3: ... }
SECTIONS {
.a:{ *(*) }>m1
.b:{ SORT_BY_NAME(*(*)) }>m2
.c:{ *(*) }>m3
}
Sections after matching
.a: foo, bar
.b: foo stub 1, bar stub 1
.c: foo stub 2, bar stub 2
As the link proceeds, the stubs are analyzed and ordered largely as if they were copies of the original input section. This may have permanent effects on the output sections they are tentatively placed into; the semantics are somewhat complex.
Sections after sorting
.a: foo, bar
.b: bar stub 1, foo stub 1
.c: foo stub 2, bar stub 2
When the linker reaches the loop that assigns addresses and creates thunks, memory regions are expanded accordingly. If an input section sequence would cause an LMA or VMA memory region to overflow, the input section is moved to the location of the next stub, and that stub is deleted. The input section’s properties may need to change to those assigned to the stub by earlier parts of the link; if this is not possible, the link fails. If there are no further stubs, the input section fails to fit, so the link fails, unless the section ends up assigned to /DISCARD/
. Stubs are skipped and have no effects on addresses.
Each time through the loop, the address assignment may change due to additional linker relaxations or thunk insertions, which may cause the input section to spill again. To ensure a fixed point is reached, an input section can never move backwards; stubs are never re-added.
First iteration
.a: foo (assigned), bar (assigned)
.b: bar stub 1 (skipped), foo stub 1 (skipped)
.c: foo stub 2 (skipped), bar stub 2 (skipped)
Second iteration (thunk causes bar to overflow)
.a: foo (assigned), thunk,
.b: bar (assigned), foo stub 1 (skipped)
.c: foo stub 2 (skipped), bar stub 2 (skipped)
Once a final assignment has been reached, a final pass removes all the stubs, since they correspond to spills not taken. The link then completes as usual.
Final state
.a: foo, thunk,
.b: bar
.c: <empty>
Semantics
A large number of linker features interact with the flag. Exact behavioral compatibility with GNU LD on these interactions shouldn’t be a goal; the precise behavior of the feature is almost completely undocumented, and its adoption appears very low. It’s more important that the behavior is predictable and usable.
Failing to fit
We should explicitly break from GNU LD’s documented behavior of silently discarding sections when they fail to fit any matching output section, since it’s unnecessarily error prone. It would be better to treat them as orphan sections and better still to fail the link. Without the flag a link fails when a section cannot fit its first match, so the link should also fail when a section cannot fit its last match.
/DISCARD/
It’s clear that input sections that have the output section /DISCARD/
as their first match should never be spilled to later matches. They are explicitly specified to be discarded, nothing specified earlier takes priority over this, and there’s no memory region that could justify the spill.
It’s less obvious whether or not a section should be allowed to spill from an earlier output section to /DISCARD/
. On one hand, it isn’t a real output section, so it seems odd to spill into it. On the other hand, removing sections that fail to fit anywhere earlier seems useful enough to justify the possibility. This slightly complicates implementation, since input sections that match /DISCARD/
could not summarily be removed; instead, they need to be remembered until address assignment.
ONLY_IF_RO
/ ONLY_IF_RW
These constrain an output section to be created only on condition that all of its input sections have the corresponding section flag. Following the stub model, the flag considers all matching input sections, not only those that will actually be emitted in the output section.
Alignment
The stub approach also results in each section being aligned to at least the maximum of any input section that matches it or that might be spilled into it, even if those spills don’t end up occurring. This can waste space, although it can be mitigated somewhat by separating out high-alignment code/data and giving those sections unique matches. A section’s final alignment can be used as part of its own layout, so this disadvantage seems fairly intrinsic.
Section Merging
The stub model doesn’t work well with section merging, since an input section might be merged irreversibly with their neighbors, making it impossible to move later. Fixing this would require deferring merging until the final assignments are known, but the final assignments depend on size information that depends on the result of merging. Accordingly, SHF_MERGE
sections should be made explicitly out of scope for this flag; they should not spill from their first matching output section, and failure to fit there should be a link error. This can be achieved by not inserting stubs for these sections.
Identical Code Folding (ICF)
It’s already the case that two sections are only foldable if their output sections are identical. With the flag, ICF would instead check for an identical sequence of matching output sections. Stubs would be treated as a unit with their corresponding input sections, similar to the existing behavior for SHF_LINK_ORDER
.
SHF_LINK_ORDER
Currently, LLD resolves SHF_LINK_ORDER
by sorting after the first round of address assignment. This only occurs during the first round; if a target section were spilled during later rounds of address assignment, its relative order with other sections might change and violate the ordering constraint.
There’s no good way to recast this guarantee in terms of multiple matches; known use cases (e.g. ARM exception handling) really do depend on the addresses of the final layout. However, this seems resolvable by re-running resolveShfLinkOrder() after each round of address assignment when the flag is enabled.
INSERT BEFORE
/AFTER
GNU LD disallows the including INSERT BEFORE
or INSERT AFTER
in a linker script when the flag is being used. Spilling occurs sequentially following the textual order of the linker script, so a section spilled into an INSERT
may effectively move backward to the inserted location. However, layout is done in a single pass in the post-INSERT
ordering, so when the moved-to location was originally reached, the information about whether or not anything should actually be emitted is unavailable. Accordingly, it seems prudent to follow GNU and disallow this with an explicit error message.
OVERWRITE_SECTIONS
This is much simpler than INSERT
, since its processing occurs before/during the initial assignment of input section to output section. Since the intent is to replace a section command wholesale, the matches within these sections should be ordered within the stub sequences as if they were given in the location of the replaced section, not the override.
Open Question
It’s not completely clear in what circumstances it’s legal to move a section to take the place of one of its stubs. While I’ve been able to verify in a prototype that creating and deleting stubs in the existing LLD test suite doesn’t introduce new failures, implying that pointers to them aren’t stashed anywhere inappropriate, the properties of the stubs may end up being altered as the link proceeds, such that they differ from those of the original section. In some cases it may be possible to alter the section’s properties as it is moved, but this might not always be possible. It might do well to take a conservative tack here, failing the link if any properties differ that cannot be proven safe to alter. We could then relax this on a case by case basis.
Alternatives
Defer/Undo Decision Making
An externally simpler model of the semantics would have all decisions made about the contents of an output section deferred until after the spilling has been finished. This would be after the final addresses of all sections and symbols are known, but a large number of the decisions in scope for this feed into the layout and address assignment process. Alternatively, the effects of these decisions made earlier could perhaps be made tentative, and the results of those decisions undone or altered once address assignment completes. Both alternatives would add considerable complexity to the linker, and it doesn’t seem like the semantics described would be improved enough to warrant this (especially if it required some kind of fixed point iteration of a large portion of the linker).
Early Output Section Sizing
Instead of deferring decision making until spilling is finished, spilling could be moved before the necessary decisions take place, immediately after section matches were determined. Unfortunately, accurate spilling decisions require accurate output section sizes. These depend heavily on the final layout, since linker scripts can refer to and manipulate the current position. Computing size early requires ignoring layout-dependent aspects of the output section description. This would cause wasted space if the size were overestimated and a failed link if underestimated. This would be harder to work around than the limitations of the stub approach, since the user would need to keep in mind a full model of how sizes are estimated to resolve corresponding link failures.
Future Directions
MAX_SIZE(…)
In cases where the common sections are to be split across multiple discontinuous memory regions, an implementation might first assign all of .text
that fits, then all of .rodata
, etc. This gives absolute priority to .text
, which may not be desirable. It should be possible to say that no more than a certain amount of .text
should be placed before moving on to .rodata
.
The user can achieve this by altering by altering the linker script to split each memory region into dedicated regions for each type of section. These regions can then be given fixed sizes, which will cause spilling to move on once that region has been filled. This is a bit clunky, since the offsets of these sections have to be determined via linker script arithmetic.
Peter Smith earlier suggested a size limit syntax for a hypothetical .ANY
input section descriptor. I’d suggest a MAX_SIZE(...)
output section keyword instead, since the semantics and implementation should be quite simple. It would just keep a size counter around for the output section itself, in addition to those maintained for memory regions.
This is something to discuss more broadly with other LD-compatible linkers to come to common terms about the syntax/semantics. This is also non-essential, since it doesn’t offer any powers that cannot be gained through manipulations of the memory regions.