[LLD] RFC Range Thunks Implementation review for ARM and Mips

This RFC is primarily to support the review of the range extension
thunks implementation of lld. This concerns ARM and Mips as all of the
thunk creation step is skipped over if the target doesn't need thunks.

Mips LA25 Thunks are not range extension Thunks but these are
generated using the same code, I've kept the behaviour the same as it
is now, although the implementation is obviously slightly different in
the detail.

Recap of range extension thunks
- Many architectures with fixed instruction width have a branch instruction
that has a limited range
- On ARM and AArch64 the static linker is expected to insert range extension
thunks when some classes of branch instruction are out of range

Where are we now with LLD and thunks
- We have moved the createThunks() function as late as it can be in
finalizeSections(), most importantly after all the content has been
added to the image.
- We only call createThunks() once and we don't calculate addresses
before createThunks().
- If an OutputSection is described by a LinkerScript then any thunks
created in it are put at the end of the OutputSection. This is due to
the LinkerScript not being aware of the thunks (InputSections) in the
InputSectionDescriptions.

Proposed implementation for range extension thunks
At a high-level we need to solve the following problems:
- Assign addresses more than once
- Maintain state between successive calls of createThunks()
- Synchronization of the linker script and the OutputSection after adding thunks
- Deciding when we need a range extension thunk and where to place it so that it
can be reused by multiple callers.

The first 3 problems are where range extension thunks interact most
with the rest of the linker. The last problem can be confined to the
thunk implementation.

Address Assignment
The non-linker script address assignment can already be called
multiple times. The linker script address assignment records state as
it goes, if this state is reset it can be called more than once.

Maintaining state between successive calls to createThunks()
I've chosen to introduce a class ThunkCreator that can maintain the
state between calls.

Synchronization of OutputSection and linker scripts
An OutputSection described by a linker script SECTIONS command has one
or more InputSectionDescriptions that control how InputSections are
laid out in that OutputSection. If a thunk section is added to
OutputSection.Sections it must be added to the correct
InputSectionDescription. For the implementation I have chosen to add
Thunks to OutputSection.Sections and then merge these into the
InputSectionDescriptions for the OutputSection.

Placement of range extension thunks
I have chosen a simple implementation of spacing ThunkSections a
Target dependent distance apart. For ARM (lld currently supports v7a)
this is set to the Thumb2 branch range of 16Mb. If a thunk cannot be
created in one of these spaced out ThunkSections a new ThunkSection is
created. This allows special cases such as the Thumb2 conditional
branch with a range of 1Mb to be supported without reducing the
spacing of the pool.

This is a different approach to ld.bfd and gold, in gold at least a
thunk (Stub) must be placed within one of the precreated
ThunkSections, these are called stub groups. There is a command line
option --stub-group-size that allows the user some kind of control
over the spacing. I've not implemented the command-line option in the
initial implementation.

Reviews:
I've created reviews for a series of patches that implement range
extension thunks, I expect that these will need quite a bit of work,
especially the first three. If you've got any comments on the code, or
anything local to the patch I'd be grateful if you could leave a
comment on the relevant review. If there is anything more general then
responding to this message may be better.

The majority of the actual work is writing the tests, which should be
resusable however the implementation ends up.

The following patches are the glue code needed for range extension thunks to
be implemented.
[LLD][ELF] Make createThunks a class [NFC]
https://reviews.llvm.org/D31654
[LLD][ELF] Fix Thunks with placement restrictions and linker scripts
https://reviews.llvm.org/D31656
[LLD][ELF] Make createThunks() iterate until no more thunks added
https://reviews.llvm.org/D31657

The following set of patches add range extension thunks, they are split up to
make reviewing easier, but don't add enough to add tests until late on.
[ELF] Introduce Thunk reuse compatibility
https://reviews.llvm.org/D31658
[ELF] Be more precise about Thumb state bit in ARM thunks
https://reviews.llvm.org/D31659
[ELF] Allow multiple thunks to be added for a symbol
https://reviews.llvm.org/D31660
[ELF] Pre-create ThunkSections at Target specific intervals
https://reviews.llvm.org/D31661
[ELF] Introduce range extension thunks for ARM
https://reviews.llvm.org/D31662
[ELF] Account for range thunk that has drifted out of range
https://reviews.llvm.org/D31663
[ELF] Prefer placing ThunkSections before non ThunkSections
https://reviews.llvm.org/D31664
[ELF] Add test cases for range extension thunks (no linkerscripts)
https://reviews.llvm.org/D31665
[ELF] Add test cases for range extension thunks using linkerscripts
https://reviews.llvm.org/D31666

Peter

Proposed implementation for range extension thunks
At a high-level we need to solve the following problems:
- Assign addresses more than once
- Maintain state between successive calls of createThunks()
- Synchronization of the linker script and the OutputSection after adding thunks

This last past seems to be the messier. The issue is not with the
patch, is with the existing infrastructure that uses a completely
different representation for linker scripts and non linker scripts.

What I think is needed is for the writer to create a dummy "script"
and use what is now LinkerScript::assignAddresses. That "script" would

* Contain only OutputSectionCommand.
* All string manipulations would have been moved before assignAddress.
* All the orphan handling would have been made explicit before assignAddress.
* Each OutputSectionCommand would contain just a InputSectionDescription.

With this the thunk creation should be able to add thunk to a single location.

Cheers,
Rafael

Are you suggesting other linker jobs such as creating _end symbols to the linker script?

The linker script support was implemented after we wrote the current Writer class, so it is somewhat “plugged in” to the Writer. It might not be the best design, and not many other options have been explored. So there might be room to improve code by moving work loads from the Writer to the LinkerScript. But we need to careful not to hurt performance by doing that.

My understanding is that this would be (initially) limited to
fabricating enough linker script commands such that we could replace:
fixSectionAlignments()
assignAddresses()
Script->processNonSectionCommands()

With something like:
Script->assignAddresses() // Could be done multiple times
Script->processNonSectionCommands() // This should only be done once

In theory all the other __start and __end symbols could still be kept
separate if the linker script commands were created late, and in a
compatible way. I also don't think that this means removing
OutputSections::Sections just yet either?

I don't think that we are proposing to follow the ld.bfd model of
driving the default case via a built in linker script yet? I think
that this would be considerably more work than just this limited
change.

I think the best way forward is to try and prototype something to see
if it splashes out any special cases. I can give this a go to see what
happens.

In the meantime I would be grateful if there is any opportunity to
move forward some of the range thunks changes in parallel, even if
they do not initially work with some linker scripts.

If the above change to always using Script->assignAddresses() did
happen then createThunks() would become a little bit more complicated
as it would need to step through one or more input section
descriptions per OutputSection. Any Thunks created would still need to
be added to both InputSectionDescriptions and OutputSections::Sections
but we could just use push_back().

Peter

Just FYI: A quick experiment that got as far as creating an
OutputSectionCmd for each OutputSection when doing a link without a
linker script exposed an interesting performance problem with the
many-sections.s test.

To reproduce just add a linker script to the link in the test that
will force the creation of a large number of orphan sections, for
example:
// RUN: echo "SECTIONS { \
// RUN: . = SIZEOF_HEADERS; \
// RUN: .text : { *(.text) } } \
// RUN: " > %t.script
// RUN: ld.lld %t --script %t.script -o %t2
// RUN: llvm-readobj -t %t2 | FileCheck --check-prefix=LINKED %s

This will take over 60s to run the test on my machine. I think the
culprit is Script->writeDataBytes(Name, Buf); in
OutputSection::writeTo() which searches for the OutputSection by name.
With a huge number of sections this is going to take a long time. I'm
not sure if many-sections.s with a linker script is a representative
test case for lld as it stands but if we do go down the route of
fabricating a linker script command for each output section we'll need
to make a better mapping from OutputSection to OutputSection command
than a linear search by name.

Peter

No. The artificial commands would contain just sections.

Cheers,
Rafael

My understanding is that this would be (initially) limited to
fabricating enough linker script commands such that we could replace:
fixSectionAlignments()
assignAddresses()
Script->processNonSectionCommands()

With something like:
Script->assignAddresses() // Could be done multiple times
Script->processNonSectionCommands() // This should only be done once

Correct.

In theory all the other __start and __end symbols could still be kept
separate if the linker script commands were created late, and in a
compatible way. I also don't think that this means removing
OutputSections::Sections just yet either?

Probably not, but it might got away in the future.

I don't think that we are proposing to follow the ld.bfd model of
driving the default case via a built in linker script yet? I think
that this would be considerably more work than just this limited
change.

I really *don't* want to see lld do that. Using a real linker script
is a bad idea is it forces the link to be section name based. There is
no way to combine sections based on their flags for example.

We would still have exactly the same logic as to how sections are
combined. We would then just create the same structure that the linker
script address assignment logic uses.

Before any of this, we have to move all name based logic out of assignAddresses.

I think the best way forward is to try and prototype something to see
if it splashes out any special cases. I can give this a go to see what
happens.

Cool. I am going on vacation tomorrow night, but I will try to at
least move some of the string lookups before assign address.

In the meantime I would be grateful if there is any opportunity to
move forward some of the range thunks changes in parallel, even if
they do not initially work with some linker scripts.

Could we maybe start with *no* linker script support? If the idea of
unifying the representation works out we will get that for free.

Cheers,
Rafael

Thanks for the comments. I'm happy to start with no linker script
support, it would be sufficient to get clang built which enables an
ARM clang, lld, test-suite build bot.

Peter

I've made an attempt at unifying the address assignment as suggested
at https://reviews.llvm.org/D31888

This creates equivalent linker script commands for the OutputSections.
There was one test ELF/tls-offset.s that I had to update as the way
that Writer::assignAddresses() calculates the address of a following
non .tbss section is different to the linker script, in particular the
linker script won't allow setDot() to give a lower address I've
updated the test to match the equivalent linker script.

The review only changes the existing call to assignAddresses() which
happens after createThunks(). Next experiment is to call
assignAddresses() before createThunks() and rewrite createThunks() to
use InputSectionDescriptions.

Peter

I've made a bit more progress with converting createThunks() to use
InputSectionDescriptions. I've created https://reviews.llvm.org/D32030
just to hold a first draft of the diff to show where it is going (not
a review request). The experience so far is:
- The .ARM.exidx SHF_LINK_ORDER section sorting can be broken by the
LinkerScript sorting in the same way as Thunks can be. In essence we
sort the OutputSections::Sections, which can be broken by
Script->assignAddresses() if any part of the .ARM.exidx is covered by
the linker script.
- It isn't sufficient to just fabricate InputSectionDescriptions when
there is no SECTIONS command, we must fabricate
InputSectionDescriptions to cover the orphan sections as well.
- There is a lot of scope to find a better interface to getting the
InputSectionDescriptions
- There will need to be some optimization as there is quite a lot of
looking up by name.

Peter