[RFC] LLD --enable-non-contiguous-regions

mysterymath · January 25, 2024, 12:32am

Introduction

To provide a solution for automatic section packing in LLD, we’d like to implement the GNU LD flag --enable-non-contiguous-regions. This allows input sections that would overflow memory regions to spill to a later matching output section rather than failing the link. This can be used to first-fit pack input sections into memory regions, which is useful for embedded development, since this otherwise needs to be done by hand.

Background

The matching of input sections to output sections occurs fairly early in LLD, and many later steps depend on that matching. Memory regions overflows are detected much later, when addresses are being assigned to sections and symbols. The algorithm that does this assignment is one of the only parts of the linker that uses iteration to fixed point. This is because the linker relaxations and thunk insertion both affect and are affected by precise section addresses. Section spilling would be a similar concern.

Implementation

We propose an approach similar to that in GNU LD. As a design goal, the flag should have negligible effect on linker time/space when disabled. It’s not a goal for the flag to have no behavioral effects on links where no spills actually occur; the reasons for this are discussed later.

When the flag is enabled, during input section assignment to matching input section descriptions, any matches after the first create a stub that refers to the original input section. The sequence of stubs for an input section is recorded in an ancillary map.

Example linker script

  MEMORY { m1: ..., m2: ..., m3: ... }
  SECTIONS {
    .a:{ *(*) }>m1
    .b:{ SORT_BY_NAME(*(*)) }>m2
    .c:{ *(*) }>m3
  }

Sections after matching

.a: foo, bar
.b: foo stub 1, bar stub 1
.c: foo stub 2, bar stub 2

As the link proceeds, the stubs are analyzed and ordered largely as if they were copies of the original input section. This may have permanent effects on the output sections they are tentatively placed into; the semantics are somewhat complex.

Sections after sorting

.a: foo, bar
.b: bar stub 1, foo stub 1
.c: foo stub 2, bar stub 2

When the linker reaches the loop that assigns addresses and creates thunks, memory regions are expanded accordingly. If an input section sequence would cause an LMA or VMA memory region to overflow, the input section is moved to the location of the next stub, and that stub is deleted. The input section’s properties may need to change to those assigned to the stub by earlier parts of the link; if this is not possible, the link fails. If there are no further stubs, the input section fails to fit, so the link fails, unless the section ends up assigned to /DISCARD/. Stubs are skipped and have no effects on addresses.

Each time through the loop, the address assignment may change due to additional linker relaxations or thunk insertions, which may cause the input section to spill again. To ensure a fixed point is reached, an input section can never move backwards; stubs are never re-added.

First iteration

.a: foo (assigned), bar (assigned)
.b: bar stub 1 (skipped), foo stub 1 (skipped)
.c: foo stub 2 (skipped), bar stub 2 (skipped)

Second iteration (thunk causes bar to overflow)

.a: foo (assigned), thunk,
.b: bar (assigned), foo stub 1 (skipped)
.c: foo stub 2 (skipped), bar stub 2 (skipped)

Once a final assignment has been reached, a final pass removes all the stubs, since they correspond to spills not taken. The link then completes as usual.

Final state

.a: foo, thunk,
.b: bar
.c: <empty>

Semantics

A large number of linker features interact with the flag. Exact behavioral compatibility with GNU LD on these interactions shouldn’t be a goal; the precise behavior of the feature is almost completely undocumented, and its adoption appears very low. It’s more important that the behavior is predictable and usable.

Failing to fit

We should explicitly break from GNU LD’s documented behavior of silently discarding sections when they fail to fit any matching output section, since it’s unnecessarily error prone. It would be better to treat them as orphan sections and better still to fail the link. Without the flag a link fails when a section cannot fit its first match, so the link should also fail when a section cannot fit its last match.

`/DISCARD/`

It’s clear that input sections that have the output section /DISCARD/ as their first match should never be spilled to later matches. They are explicitly specified to be discarded, nothing specified earlier takes priority over this, and there’s no memory region that could justify the spill.

It’s less obvious whether or not a section should be allowed to spill from an earlier output section to /DISCARD/. On one hand, it isn’t a real output section, so it seems odd to spill into it. On the other hand, removing sections that fail to fit anywhere earlier seems useful enough to justify the possibility. This slightly complicates implementation, since input sections that match /DISCARD/ could not summarily be removed; instead, they need to be remembered until address assignment.

`ONLY_IF_RO` / `ONLY_IF_RW`

These constrain an output section to be created only on condition that all of its input sections have the corresponding section flag. Following the stub model, the flag considers all matching input sections, not only those that will actually be emitted in the output section.

Alignment

The stub approach also results in each section being aligned to at least the maximum of any input section that matches it or that might be spilled into it, even if those spills don’t end up occurring. This can waste space, although it can be mitigated somewhat by separating out high-alignment code/data and giving those sections unique matches. A section’s final alignment can be used as part of its own layout, so this disadvantage seems fairly intrinsic.

Section Merging

The stub model doesn’t work well with section merging, since an input section might be merged irreversibly with their neighbors, making it impossible to move later. Fixing this would require deferring merging until the final assignments are known, but the final assignments depend on size information that depends on the result of merging. Accordingly, SHF_MERGE sections should be made explicitly out of scope for this flag; they should not spill from their first matching output section, and failure to fit there should be a link error. This can be achieved by not inserting stubs for these sections.

Identical Code Folding (ICF)

It’s already the case that two sections are only foldable if their output sections are identical. With the flag, ICF would instead check for an identical sequence of matching output sections. Stubs would be treated as a unit with their corresponding input sections, similar to the existing behavior for SHF_LINK_ORDER.

`SHF_LINK_ORDER`

Currently, LLD resolves SHF_LINK_ORDER by sorting after the first round of address assignment. This only occurs during the first round; if a target section were spilled during later rounds of address assignment, its relative order with other sections might change and violate the ordering constraint.

There’s no good way to recast this guarantee in terms of multiple matches; known use cases (e.g. ARM exception handling) really do depend on the addresses of the final layout. However, this seems resolvable by re-running resolveShfLinkOrder() after each round of address assignment when the flag is enabled.

`INSERT BEFORE`/`AFTER`

GNU LD disallows the including INSERT BEFORE or INSERT AFTER in a linker script when the flag is being used. Spilling occurs sequentially following the textual order of the linker script, so a section spilled into an INSERT may effectively move backward to the inserted location. However, layout is done in a single pass in the post-INSERT ordering, so when the moved-to location was originally reached, the information about whether or not anything should actually be emitted is unavailable. Accordingly, it seems prudent to follow GNU and disallow this with an explicit error message.

`OVERWRITE_SECTIONS`

This is much simpler than INSERT, since its processing occurs before/during the initial assignment of input section to output section. Since the intent is to replace a section command wholesale, the matches within these sections should be ordered within the stub sequences as if they were given in the location of the replaced section, not the override.

Open Question

It’s not completely clear in what circumstances it’s legal to move a section to take the place of one of its stubs. While I’ve been able to verify in a prototype that creating and deleting stubs in the existing LLD test suite doesn’t introduce new failures, implying that pointers to them aren’t stashed anywhere inappropriate, the properties of the stubs may end up being altered as the link proceeds, such that they differ from those of the original section. In some cases it may be possible to alter the section’s properties as it is moved, but this might not always be possible. It might do well to take a conservative tack here, failing the link if any properties differ that cannot be proven safe to alter. We could then relax this on a case by case basis.

Alternatives

Defer/Undo Decision Making

An externally simpler model of the semantics would have all decisions made about the contents of an output section deferred until after the spilling has been finished. This would be after the final addresses of all sections and symbols are known, but a large number of the decisions in scope for this feed into the layout and address assignment process. Alternatively, the effects of these decisions made earlier could perhaps be made tentative, and the results of those decisions undone or altered once address assignment completes. Both alternatives would add considerable complexity to the linker, and it doesn’t seem like the semantics described would be improved enough to warrant this (especially if it required some kind of fixed point iteration of a large portion of the linker).

Early Output Section Sizing

Instead of deferring decision making until spilling is finished, spilling could be moved before the necessary decisions take place, immediately after section matches were determined. Unfortunately, accurate spilling decisions require accurate output section sizes. These depend heavily on the final layout, since linker scripts can refer to and manipulate the current position. Computing size early requires ignoring layout-dependent aspects of the output section description. This would cause wasted space if the size were overestimated and a failed link if underestimated. This would be harder to work around than the limitations of the stub approach, since the user would need to keep in mind a full model of how sizes are estimated to resolve corresponding link failures.

Future Directions

MAX_SIZE(…)

In cases where the common sections are to be split across multiple discontinuous memory regions, an implementation might first assign all of .text that fits, then all of .rodata, etc. This gives absolute priority to .text, which may not be desirable. It should be possible to say that no more than a certain amount of .text should be placed before moving on to .rodata.

The user can achieve this by altering by altering the linker script to split each memory region into dedicated regions for each type of section. These regions can then be given fixed sizes, which will cause spilling to move on once that region has been filled. This is a bit clunky, since the offsets of these sections have to be determined via linker script arithmetic.

Peter Smith earlier suggested a size limit syntax for a hypothetical .ANY input section descriptor. I’d suggest a MAX_SIZE(...) output section keyword instead, since the semantics and implementation should be quite simple. It would just keep a size counter around for the output section itself, in addition to those maintained for memory regions.

This is something to discuss more broadly with other LD-compatible linkers to come to common terms about the syntax/semantics. This is also non-essential, since it doesn’t offer any powers that cannot be gained through manipulations of the memory regions.

smithp35 · January 25, 2024, 12:00pm

Thanks for posting. There’s quite a lot to think about there. Some early thoughts based on a first read:

Although an implementation detail, how are the stubs handled in terms of contributing to the (estimated) size of the output section, and for address assignment? My guess is that they would be skipped over as they are more like “placeholders”. Just thinking about things like Thunk distance calculations as these could get complicated if the stubs contribute to size and have an address. As an aside the name stub may be a bit close to the use of stub in GNU ld/gold (equivalent to thunk in lld) to confuse people.

Arm’s linker with its .ANY sections went down the early assignment route. We have an estimate of the amount of extra content generated by the equivalent of finalizeAddressDependentContent(). This estimate would be subtracted from the maximum size that the OutputSection equivalent could be filled to. The contingency could be increased via a command-line if it wasn’t sufficient. In experience this worked well for us, although I’m hesitant to say that the same would apply to --enable-non-contiguous-regions as I think the .ANY sections are far more constrained:

All input sections/output sections take part and not just a subset.
The maximum sizes are per MEMORY region (multiple OutputSections) and not a single OutputSection.

One of things we needed to protect against is Sections that have had the equivalent of _start and _stop symbols referenced, these need to be assigned as a unit in the same way as section merging.

I think it would probably be wise to have a first pass at the algorithm prior to finalizeAddressDependentContent(). Then that function only needs to deal with moving sections that are affected by insertion of Thunks etc. You may have implemented it that way already.

I’m expecting that the finalizeAddressContent() code could have the potential to need more passes to converge as some section distances are going to shrink.

Will keep thinking in the background. If I can think of anything more I’ll post a follow up.

mysterymath · January 25, 2024, 10:52pm

My guess is that they would be skipped over as they are more like “placeholders”.

Yep, that’s correct; they wouldn’t participate in address assignment at all. That means that their presence shouldn’t affect the addresses assigned to other sections or any symbol. There will be no symbols that reference the stubs either, so they shouldn’t affect thunk generation or relaxation based on the relative position of symbol values and sections.

As an aside the name stub may be a bit close to the use of stub in GNU ld/gold (equivalent to thunk in lld) to confuse people.

That’s a good point; maybe something like “potential spill” or “spill point”?

Arm’s linker with its .ANY sections went down the early assignment route. The contingency could be increased via a command-line if it wasn’t sufficient.

That is interesting. I had initially tried to prototype this using the earlier assignment approach, since the semantics are simpler, but I ended up hitting a wall. GNU linker scripts can do arbitrary arithmetic on symbol and section addresses that appear previously in the linker script, then advance the current position to the result. Such expressions in LLD compile down to lambdas that cannot be examined, and they can only safely be executed during address assignment. We’d either need to summarily ignore assignments or special-case common cases; either way limits linker script flexibility.

One of things we needed to protect against is Sections that have had the equivalent of _start and _stop symbols referenced, these need to be assigned as a unit in the same way as section merging.

Can you provide more details about what you mean by this? Is there an issue with moving sections referenced by these symbols? I’d think if spilling changed the values of these symbols, it would guarantee another round of address assignment, as if a thunk or relaxation had changed their values.

I think it would probably be wise to have a first pass at the algorithm prior to finalizeAddressDependentContent().

My prototype handled the stubs directly inside finalizeAddressDependentContent(); that seems the most direct way to handle it, since spilling decision depends on the state of that algorithm when the input section to be spilled is reached. This also lets the effects of a spill propagate directly on to later sections, so if there are no thunks/relaxations, the whole link can complete in a single pass, as usual.

I’m expecting that the finalizeAddressContent() code could have the potential to need more passes to converge as some section distances are going to shrink.

Good point; we may need to tweak up the default if the flag is set.

smithp35 · January 26, 2024, 10:50am

Can you provide more details about what you mean by this? Is there an issue with moving sections referenced by these symbols? I’d think if spilling changed the values of these symbols, it would guarantee another round of address assignment, as if a thunk or relaxation had changed their values.

I don’t have a concrete example from real code that I can reference off the top of my head. I think some of the code in the profiling runtime references the technique to find bounds of the counters sections, but they don’t explicitly use it to be more portable across different linkers. I’m sure I’ve seen something similar in processing the .init_array section.

Assuming we have several object files with a .section mytable, "ar", %PROGBITS each containing a single word of data. Either via orphan sections or linker script:

  mytable : { *(mytable) } > a_memory_region

I’m assuming mytable also matches some other input section description like ().

Some other bit of code will use the linker generated __start_ and __stop_ symbols to find the bounds of mytable. They will then iterate over the addresses and do something with the table entry.

If all the mytable input sections in the mytable outputsection are moved to a different memory region all is well. If only some of them are we’ve now got two ranges of sections (potentially in different memory regions) and can’t use a single (__start_mytable, __stop_mytable) to get the bounds.

We may face complaints with linker scripts that have symbol definitions in the linker script, for example:
mytable : { mytable_start = . ; * (mytable) ; mytable_end = .; } > memory_region

The likely intent of the user that the mytable input sections should be kept together. However this case is a lot harder to reliably find.

To some degree this problem is highly unlikely to occur as these techniques tend to be used on small sections and they would have to be unlikely to be the ones to be on the boundary of a memory region.

One possibility, although potentially deviating from GNU linker script, is to limit participation in packing to input section descriptions with wildcards.

mysterymath · January 26, 2024, 11:07pm

If all the mytable input sections in the mytable outputsection are moved to a different memory region all is well. If only some of them are we’ve now got two ranges of sections (potentially in different memory regions) and can’t use a single (__start_mytable , __stop_mytable ) to get the bounds.

From the embedded LD linker scripts I’ve seen, this is a problem that tends to come up regardless. If you have half of your files’ .bss sections in one memory region (and thus output section), and half of the .bss sections in another, then intrinsically a e.g. BSS zeroing routine would need to explicitly iterate over both regions; that would tend to mean there would be something like a __region1_bss_start and a __region2_bss_start, and both would be directly mentioned in code.

This is fairly analogous to the undesirable case you mentioned; it doesn’t seems possible for the linker to determine for two output sections that both match the same input section whether spilling was an intended possibility. If not, then under the current semantics, the later section must have been intending to catch everything not previously matched (or perhaps the overlap was simply unnoticed). It seems plausible to provide an annotation for the later section to say “don’t spill into this; it’s a catch all”. Alternatively, maybe “don’t spill out of this” would be more apropos. Lacking this, one would need to give the later output section a set of wildcards that doesn’t intersect with the earlier ones.

I could see there being value in instead making it syntactically explicit that a wildcard is expected to provide an alternative placement for sections matched earlier. If the earlier matches also needed to be annotated to be spillable, that would essentially amount to something similar to your ANY(...) syntax for input section descriptions; even if the semantics were as described here. Then, a section would be forced to its first match if that match wasn’t annotated by ANY, and otherwise, would generate placeholders for any later ANY input section description that matches. As mentioned earlier, we’d need to get buy in from GNU LD for this, but notably I don’t think this really precludes writing a mostly compatible --enable-non-contiguous-regions as a first step; ANY could for the most part be a more explicit way to access the same semantics.

smithp35 · January 29, 2024, 9:40am

I think the difference between the splitting up of BSS sections in the linker script is that while doing that will cause problems to any start up code that is expecting only one contiguous BSS region, it is intentional in the linker script and it can always be accounted for in the startup code. If the --enable-non-contiguous splits it apart based on overflow of regions then it isn’t possible to easily compensate for it beyond manually inserting/removing other sections from the linker script to try and avoid an overflow.

I think this is likely to be an edge case that rarely affects anyone in practice, but given enough projects using it, it will probably happen a few time.

Probably not required for an intial implementation, but I think we may want to add it later.

tannewt · January 30, 2024, 4:32am

Is the only way this overflow happens with the *(*) selector? or could you add two *(.bss.*) regions to overflow between? Then you could have two sets of bounds for blanking.

In a world without linker scripts you could output start and end variables for each bss section in each output section.

smithp35 · January 30, 2024, 10:15am

As I understand it, it can happen with any input section description * (*) just makes it more likely.

The case I’m worried about is something like .init_array which if it accidentally gets split up across multiple memory regions it is going to difficult to deal with.

MaskRay · February 4, 2024, 8:10am

MEMORY { m1: ..., m2: ..., m3: ... }
SECTIONS {
  .a : { *(*) }>m1
  .b : { SORT_BY_NAME(*(*)) }>m2
  .c : { *(*) }>m3
}

I know that *(*) is for exposition purposes. Actual uses would likely have a specific section name like *(foo).
Whether these sections need bin-packing or not depend on whether subsequent output sections contain *(foo).
The drastic behavior change due to a non-local pattern and a global option could be surprising.

I have read that @smithp35 has a reply on the previous post “LLD Linker Section Packing”:

Personally I’d prefer to not apply rules globally via a command line option and to use additional syntax in the linker script.

I do not fully grok the armlink .ANY extension, but the from @smithp35’s example a keyword used similar to SORT_BY_NAME/REVERSE
looks pretty neat and tells readers that the input section description is special. Extensions can also be added when sufficiently useful.

  .text.1 : { * (.ANY(*.text.*), 0x1000) }
  ...
  .text.2 : { * (.ANY(*.text.*), 0x2000) }

If we can focus on the semantics, we probably should not be limited by GNU ld adding --enable-non-contiguous-regions first.

As you mentioned, the semantic compatibility goal we could achieve is a bit vague anyway…

A large number of linker features interact with the flag. Exact behavioral compatibility with GNU LD on these interactions shouldn’t be a goal; the precise behavior of the feature is almost completely undocumented, and its adoption appears very low. It’s more important that the behavior is predictable and usable.

Thanks for analyzing the interaction with other features. SHF_MERGE is typically for specific sections and SHF_LINK_ORDER is for some small metadata sections.
They are likely not candidates of the bin-packing feature. Using a keyword like .ANY will probably save us some special cases.

For encapsulation symbols (__start_*, __stop_*), if input foo sections can reside in two output sections, I think the user is responsible for ensuring that all output sections are processed. The linker probably does not need to do anything special.

mysterymath · February 5, 2024, 10:27pm

Yeah, from the discussion here, plus with some offline conversations I’ve had over the last few weeks, it really does sound like introducing an ANY() syntax should be the thing we try to introduce. I’ll start a conversation on the GNU LD side to gauge opinion. If that falls through, then it still seems beneficial to have a nice clean way to use this in LLD. Interoperability could be obtained by tacking on the flag too; the two should be frontends to the same code for the most part.

I’ll have to play around a bit to try to figure out the way this should interact with SORT_BY_x and friends. But otherwise, it seems like there’s a straightforward semantics for it. The first ANY would behave identically to a regular wildcard match, but it would register the input section as spillable. Non-ANY wildcards should always exclude previous matches, so spillable sections could only match later ANY sections. Those would generate potential spills as described earlier. The effects of the flag, if implemented, should roughly be equivalent to treating every wildcard as ANY.

mysterymath · February 6, 2024, 12:39am

After some offline discussion, I’m vaguely leaning towards a slightly more powerful syntax with a simpler semantics.

It’s kind of ugly for there to be two different section matching semantics: regular wildcards that capture input sections and ANY wildcards that capture WRT non-ANY sections, but don’t WRT ANY sections.

Instead, you could say something like ANY(tag <wildcards>), where the only effect is to name the matching sections by the tag and allow them to spill. Later locations where the sections could spill would be specified by ANY(tag), without any additional wildcards.

This would simplify the matching part of the implementation, since the only effect on matching would be to create the tags and record the matching sections. ANY(tag) would then directly instantiate potential spill sections in the same order as the original match.

The increase in power is modest, and it comes from constructs like this:

{ ANY(special_text .text.special.*) }
...
{ ANY(regular_text .text.*) }

After this, one could supply a completely different set of spill regimes for special_text and regular_text. That is, it preserves the ability to use broader wildcards to catch things not covered by more specific earlier ones, and it allows various sets of matches to spill separately from one another.

mysterymath · February 7, 2024, 1:02am

I’ve started a parallel discussion on the Binutils mailing list:

https://sourceware.org/pipermail/binutils/2024-February/132343.html

mysterymath · March 12, 2024, 10:04pm

A fair amount of discussion occurred on the binutils mailing list in the interim. That discussion has since petered off into (seems to me) a rough consensus, so here’s a paraphrased summary:

Me: Proposed the ANY syntax discussed above.

Roland McGrath: Noted that this syntax serves double duty; one variant of it (with wildcards) produces a subset of sections that correspond to a tag, while another variant (without wildcards) produces a possible spill location for the tagged sections. Suggested it would be clearer to break a SECTION_CLASS syntax out as siblings of the various output section description to name sections. This parallels the function of output section descriptions by naming a set of input sections; the SECTION_CLASS references inside an input section would then only provide placement locations for those sections.

Me: Noted that having a SECTION_CLASS syntax parallel with output section descriptions does increase expressive power of the linker script for little cost, which is nice, but it makes it harder to move pre-existing linker scripts to use spilling, since a specific subset of input sections might be difficult to name outside of a particular point in an input section description, due to the imperative nature of linker scripts. Suggested that if a parallel SECTION_CLASS syntax were supported, it should be in addition to definitions within input section descriptions, rather than replacing it.

Me: Reported that given the semantics, preferred SECTION_CLASS to ANY (Aside: ANY is a much better name for the semantics used by armlink than the one we ended up with.) Given that it’s inside a SECTIONS description, it should just be CLASS. (e.g., CLASS(<name> <input section descriptions>) creates the “section class” with the given name, while CLASS(<name>) provides possible placements for that class of input sections.

Christophe Lyon: Provided background on the original binutils discussion about --enable-non-contiguous-regions. Suggested that having a global flag switch produces a less intrusive implementation, and that linker scripts are already poorly understood, and having keywords with subtle distinctions between their uses or contextual behavioral differences adds to this.

Update

Bit of an update on my part; I ended up having a SSD crash, so I’m working on recovering the setup I had. I’ve most of a MVP implementation of --enable-non-contiguous-regions in LLD, and I’m working on validating it against our little embedded example use-case. I’ll move on to implementing the above CLASS syntax/semantics afterwards. The definition/use semantics we ended up with is mostly context insensitive (linked by class names), and it seems fairly straightforward to mentally model and difficult to misread or misuse, but let me know if you disagree.

mysterymath · April 4, 2024, 12:08am

I’ve gotten to the point of trying my WIP branch for --enable-non-contiguous-regions against a real embedded project. So far so good; I got a straightforward refactor of the linker script to succeed (waiting on access to the hardware to test). There were a couple of issues centered around unspillable sections.

If an unspillable input section (or other address-advancing linker script construct) appears in the link after a series of spillable input sections, the imperative nature of address assignment means that its effects would not be considered in making those spill decisions. This could cause a memory region to overflow, even though it would have been possible to make earlier spill decisions differently to avoid it.

It’s possible to work around this by moving such constructs first in the linker script; this allows later spill decisions to have a full reckoning of their effects. This leads to the second issue. SHF_MERGE sections aren’t considered spillable, and it’s rather awkward to pick them out from other .rodata sections to move them first. Worse, in our case .text was spillable, so that fragment of .rodata would have needed to be assigned before .text.

Accordingly, I implemented a simple semantics for spilling SHF_MERGE synthetic sections. The resulting merge sections take on the list of possible spill locations of their first contained input section, just as they currently take on the name of that section. This allows unrestricted merging of input sections within the first matching output section, then spilling of the resulting merge section. It would be possible to make these semantics either stricter or more sophisticated, but this seemed the most straightforward way to deal with this situation next to disallowing it.

Returning to the original issue, in our case it was possible to move unspillable sections before spillable ones, but this doesn’t form a general solution. It’s possible to vaguely imagine a kind of binary search that would find an optimal “spill cutoff point” for each memory region using a logarithmic number of address assignment passes (tying into the existing fixed-point iteration). This would allow earlier spill decisions to incorporate the full effects of later (general) linker script constructs. But this probably doesn’t belong in a MVP, and movement in that direction would be backwards compatible, since it would only change the behavior of failing links.

mysterymath · April 25, 2024, 12:14am

After much ado, initial PR is available for review. PTAL!

This only implements --enable-non-contiguous-regions, not the dedicated syntax. After spending much time under the hood, there ended up being some differences from the more abstract view I had before:

SHF_MERGE sections ended up being too big and too important to make unspillable. Instead, they were given the spill list of the first contained input section (the one that gives the result section its name).
Needed to make the finalization for .ARM.exidx runnable in the fixed point loop, since sections can now change relative order due to spilling.
The naive ICF semantics actually seems fine; the spill lists for merged-away sections go dormant, and the result is spilled using its own spill list.
Couldn’t get OVERWRITE_SECTIONS to work well, so it was made an error to mix with the flag (like INSERT BEFORE/AFTER). Probably reparable, but this PR has been baking for long enough and it doesn’t seem worth delaying for that.

And one more big one:

Unspillable sections

Unspillable sections that appear later in the assignment process than spillable sections ended up being a huge problem in practice. The original assignment algorithm would happily walk right up to the last byte, then hit an unspillable section and fail the link. Had earlier sections been spilled, there would have been more than enough room. Unspillable sections occur naturally in both linker scripts and compiler internals (e.g. thunks), so they’re tricky to avoid.

I did find a reasonable way to handle them. Instead of spilling during assignment, the overflows can be allowed to happen. Sections can then be spilled in reverse order of assignment, decreasing region sizes (naively) by the section sizes. Once all the memory regions are plausibly under size, another round of assignment can start. Taking advantage of the fixed point iteration in this way allows the sizes of unspillable sections to be implicitly considered, without much additional overhead. It does mean that sections may occasionally be spilled when they don’t have to, but it seems like a good trade-off, and we can do something more sophisticated with a better semantics if needed.

smithp35 · April 25, 2024, 2:37pm

Thanks for posting the review. I will take a look, but will likely be next week.

MaskRay · May 1, 2024, 12:17am

Thanks for posting the -–enable-non-contiguous-regions patch for evaluation as one possible direction.
The other is to separate SECTION_CLASS syntax. I hope we don’t just shut down this direction.
--enable-non-contiguous-regions is possibly less intrusive for GNU ld but avoiding a global option might be less intrusive for us and can possibly avoid some special cases.
I do believe unlike many other options, this niche option has very little value for feature parity.

mysterymath · May 1, 2024, 1:43am

It’s definitely my intention to follow up immediately with a patch that provides actual syntax rather than a flag. My main goal with the first patch was to explore and iron out mechanism and policy for spilling sections (i.e., all the feature interactions). The main reason I made --enable-non-contiguous-regions the frontend for it was that it was simpler to tack on than a parser change, while still providing a way to fully exercise the semantics in tests. Past the matching phase, it looks like the actual behavior of the feature would be very similar between the flag and the syntax. Both would result in a set of potential spill sections, and they’d end up being handled largely the same way from then on.

I wouldn’t really have many qualms about dropping the flag, especially if we can get a syntax supported in at least LLD and GNU LD. In the meantime, it still seems like it might be useful for projects that want to use automated section packing but need to link with both LD and LLD. Even so, I wouldn’t expect their semantics to be identical in many cases, so the compatibility provided may be limited.

MaskRay · May 10, 2024, 9:39pm

#90007 is a great example of comprehensively considering all scenarios. The proposed changes are not intrusive, and the current framework remains adaptable for future fine-grained syntax needs. I’ve added some comments.

Topic		Replies	Views
LLD Linker Section Packing LLD	2	2352	April 27, 2023
LLD output section flag assignment behavior LLD	7	427	November 1, 2023
LLD vs LD section type: PROGBITS vs NOBITS LLD	3	1675	January 25, 2022
[LLD] Relocation overflows and .nv_fatbin LLVM Dev List Archives	7	158	September 9, 2021
[lld] Current ways to position memory sections (e.g. .text, .data, .bss) with lld? LLVM Dev List Archives	5	423	July 15, 2015