Embedded systems often have heterogeneous memory regions available to place sections into, and GNU/LD largely determine those placements syntactically. Decisions are made using the names, types, and filenames of the input section, but not their sizes or the amount of space available in the regions. Accordingly, if too many sections are assigned to the same region, the link will fail. This requires a degree of manual assignment in the linker script to ensure everything fits.
Manually laying out input sections can be fairly onerous in practice, since when input files change sizes, the linker script can suddenly fail to link the binary due to memory region overflows. To make the process more automatic, various embedded toolchain vendors have extended, modified, or replaced the GNU LD linker script semantics. GNU LD itself has a flag to help with this as well, though it hasn’t made it into LLD.
I wanted to open discussion about whether any of these alternatives could be implemented in LLD. I’ve provided a brief survey of options implemented by major embedded vendors. It’s necessarily incomplete, but should help aid discussion.
GNU LD --enable-non-contiguous-regions
When this flag is passed to GNU LD, if the first section definition in the linker script assigns it to a memory region where it cannot fit, the match is aborted. In that case, the section remains unassigned until the next matching section definition, and the process continues from there. The flag --enable-non-contiguous-regions-warnings
emits diagnostics whenever the flag changes the allocation of a section (noisy).
Linker-generated sections and sections where the size changed due to relaxations are not allowed to fall through in this fashion. If one of these cannot fit in the first matched section, then the link fails.
Discussion
This approach requires special handling to use in practice. Since there isn’t any way to specify a limit on the size of input sections that can match an earlier section definition, sections will be assigned to it until the underlying region is filled. Accordingly, there is no general way to allocate something after the automatically filled portion.
Because of this fill-until-full semantics, in order to accommodate the common pattern of filling a logical region with CODE
, RODATA
, DATA
, and HEAP
in varying amounts, the logical region would need to be subdivided into several memory regions, e.g., one for CODE
, one for RODATA
, etc. The end of these regions could be allocated dynamically using automatic section splitting; this would allow giving each region a manually-specified budget.
TI Linker
See Section 3, “Automatic Section Splitting”
The TI linker appears to be broadly GNU LD compatible, but it adds an extension to split input sections across multiple memory regions. The syntax is:
.text { ... } >> FIRST_REGION | SECOND_REGION | THIRD_REGION
Discussion
Although the syntax differs, the semantics of this are broadly similar in their implications to the GNU LD flag. This acts as a separable extension to the linker script syntax, leaving the behavior of existing constructs alone, while the GNU LD flag modifies the behavior of regular section definitions.
MPLAB XC16
See Section 10.5, “Linker Allocation.”
Packing done largely through orphan sections; these are packed into existing memory regions with a best-fit allocator. Otherwise, broadly GNU LD compatible. Custom section attributes affect this packing… the details are complex and unlikely to generalize well.
This also includes gaps left over by alignment in sequential allocation; these are transparently packed too.
Discussion
This changes the behavior of orphan sections from GNU LD, rather than changing the behavior of general section definitions like the LD flag. Similar semantic concerns apply as with those approaches.
ARM Scatter File
ARM has specificity rules like CSS for matching input sections to output sections. Generally, the most specific rule wins, independent of order.
Sections are assigned to regions .ANY
selectors using a selectable packing algorithm, e.g., first_fit,
best_fit
, etc. ANY_SIZE
allows limiting the maximum size of an .ANY
selector, and priorities can be given to each.
ARM scatter files differ from GNU LD-style linker scripts in that the ordering of sections within memory regions is not given by the order of the section specifiers; rather, the sections are sorted by type. The FIRST
and LAST
specifiers can be used to place a set of sections first or last within the type.
Discussion
The approach used by ARM scatter files could not be used directly in LLD, since there is a general expectation of sequential assignment. This can be used to directly control the ordering and and addresses of sections, and it’s necessary that any automatic placement mechanism preserve this property.
Via the FIRST
and LAST
property and the implicit sorting by type, this approach does allow placing contents after the variable region. Implementing this well would likely imply some kind of lazy-addressing or backtracking. The addresses of the sections after a variable region could not be known until the size of the variable region is known, but the size of the variable region cannot be determined until the amount of space available for allocation is determined, which depends on the size of everything after the variable region. Alignment would also throw complexity into the mix.