My understanding from the RFC is:
- All global objects in the bitcode file will be assigned a section name.
… which is equal to the section name that they would have been emitted to if this was a regular compilation. In addition to allowing the linker to read section names from the bitcode, this also helps support mixing -ffunction-sections and -fno-function-sections and similar options (forgot to mention that in the RFC).
- A linker will communicate the output section of all global objects.
Correct. (Global objects in the LLVM sense, so that includes objects with local linkage).
- Certain transformations won’t be performed if the output section is different.
Correct. Plus, others can be enabled if they’re safe to apply when we know things are going to the same output section.
The common use cases that I can see that might not fit perfectly into
- Code that is in different OutputSections but it will be logically
correct and in many cases desirable to perform transformations on as
if they were in the same output section.
Right. The output section that the linker communicates for a symbol doesn’t need to correspond to a “physical” output section. So let’s say if the linker knows (or the user somehow tells it) that two output sections should be considered equivalent, the linker can communicate the same output section identifier for symbols in either of the two physical output sections. This is perfectly safe since the output section info is only ever used to enable/inhibit optimizations, not for actual symbol emission by LTO.
- Output section placement rules that are not based on names, for
example Arm’s linker can assign sections to an output section until
the output section size limit is reached, then a different output
section is used. I admit that this may be more of a problem for
linkers that have a different linker script model.
That should actually just work in the existing model. Before LTO runs, we don’t know the size of symbols anyway, so the linker will just communicate the original output section for all of them and we apply optimizations across them as if they all fitted in the same section. After LTO, some may end up in the ‘overflow’ section but LTO doesn’t need to know about that since it wouldn’t have been correct for the user to make any assumptions about what ends up in the original section vs overflow in the first place.
I think both cases are illustrative of a use case where the precise
output section does not matter, but there is a vaguer goal of placing
a subset of the input sections in a subset of the output sections.
From what I can tell there isn’t a way for the code generator to tell
the difference between code that is placed in different output
sections and it is not correct or beneficial to optimize and code that
is placed in different output sections and it is correct and
beneficial to optimize together.
Perhaps we should rename the “output section” that is communicated to LTO to something less specific to make it clear that it can be used for exactly this purpose. Optimization domain? Partition?
I think that this kind of use case could be supported by doing something like:
- Linker informs code generator the output sections that must not use
any information from another module and may not contribute any
information to another module. For example an output section that is
representing an overlay.
It’s not so much about other modules (files) - you could have multiple files contributing input sections to the same overlay, for instance, and you would want to optimize across them. But you wouldn’t want to de-duplicate a constant from another overlay. I think the OutputSectionID-as-optimization-domain idea captures this use case, no?
- Linker can omit the output section information for sections that the
user doesn’t care where they go, and let the linker decide based on
some size constraint later.
That’s an interesting idea to allow a ‘don’t care’ output section ID; we would have to be pretty careful in defining what that means on a per-optimization basis. That is, am I allowed to inline a function with a defined output section into a function without one (probably no)? Vice versa (probably yes)?
I think that these are mostly details rather than fundamental problems though.
Thank you very much for your comments!