[lld] Linker script findings.

Hi all, I have been investigating linker scripts and the functionality
needed to support them in lld. I have attached my findings about the
usage of ldscripts. My findings have been collected from:

- Reading all the GNU ld manual sections about linker scripts.
- Looking at the GNU ld and gold source code.
- Digging through a couple embedded programming tutorials.
- Reading through all of the linker scripts in the Linux kernel tree.
- Other random sources across the net that I'm forgetting about.

In particular, the second to last section (comprising about half the
document) describes all of the functionality that LLD API's will have
to expose to the ldscript language processor in order to link the
Linux kernel.

-- Sean Silva

ldscript-usage.txt (14.2 KB)

Hi Sean,

Thanks for providing us the information on the linker scripts.

Linker scripts also use a wide variety of keywords like :-

1) SORT
2) ALIGN
3) OVERLAY

Overlays are most commonly used in embedded applications to overlay one section over the other using a custom overlay manager.

You might want to look at the ELFLayout changes to see what functionality is missing from that.

The ELFLayoutOptions has a hook into reading the Linker script which needs to be implemented.

Thanks

Shankar Easwaran

Sean,

Thanks for doing this research and writing up that summary!

The SECTION and MEMORY seem doable in lld as part of the ELF
Writer. The one tricky part will be if the linker script defines symbols
(e.g. __text_size), because those symbol names might be
referenced by some object file atom. Thus they need an atom
representation for lld's Resolver to see. So, the ELF Writer will need
to make a first pass at the linker script and make "proxy" atoms for
any symbols the linker script defines. These atoms won't actually
have a value assigned until the Resolver is done and the atoms are
handed to the ELF Writer to complete.

-Nick

Linker scripts also use a wide variety of keywords like :-

1) SORT
2) ALIGN
3) OVERLAY

As I mentioned in the OP, I tried to focus the description on what
functionality the LLD API will need to expose in order to correctly
implement linker scripts, rather than to describe the language itself.
All relevant aspects of linker scripts should be implementable in
terms of the primitive functionality I described (although SORT may
benefit performance-wise from special handling deeper inside LLD).

You might want to look at the ELFLayout changes to see what functionality is
missing from that.

The ELFLayoutOptions has a hook into reading the Linker script which needs
to be implemented.

Thanks for the pointers, I'll take a look.

-- Sean Silva

The SECTION and MEMORY seem doable in lld as part of the ELF
Writer.

MEMORY and most aspects of SECTIONS are effectively syntax sugar and
the rest of LLD doesn't need to even be aware of it; the ldscript
language processor will desugar it. The same is true of many other
linker script constructs that I didn't mention. The goal of the
write-up was to describe the primitive functionality that will be
needed at the boundary between the language processor and the rest of
LLD (although admittedly some parts of the write-up still herald from
when I was intending to write about the entire language itself; sorry
for the confusion).

To be clear, I did learn the entire language in distilling this list
of primitive functionality (so there should be no (major) surprises in
that regard), but I decided that the existing documentation about the
language itself was good enough to obviate the need to distill a
reference for the entire ldscript language.

The one tricky part will be if the linker script defines symbols
(e.g. __text_size), because those symbol names might be
referenced by some object file atom. Thus they need an atom
representation for lld's Resolver to see. So, the ELF Writer will need
to make a first pass at the linker script and make "proxy" atoms for
any symbols the linker script defines.

Does it make sense for the ELF Writer to call into the linker script?

From what I know about linker scripts, it seems like it is more

natural to treat them like commandline argument processing, which
"calls into" the ELF Writer rather than being "called from" the ELF
Writer (indeed, many linker script constructs are isomorphic to
certain commandline options). Indeed, `--defsym` can be used to create
symbols and it is probably best to share a common code path.

-- Sean Silva

Yes, the linker scripts and the command line arguments should all be parsed
into one set of "options" to the ELF Writer.

My point was that most of linker script options don't need to be handled
until the last stage of linking when the Writer is handed the atom graph
to turn into the output file. The one exception to that is any symbols
that need to be defined because of linker scripts (or --defsym).

-Nick

Ah, I see now. That's a good observation; I hadn't thought about at
what phase each part of the script would need to be handled, only the
interface between the language processor and "everything else".

-- Sean Silva

So, looking into it a bit, I think that ELFLayoutOptions is not the
right place to parse the linker script. From what I can tell, it has
to be parsed during argument processing. As Nick pointed out, most of
the linker script stuff can be handled during the final writing stage,
but since the script can define symbols, it has to be parsed earlier.
Also, Nick's perspective was that "the linker scripts and the command
line arguments should all be parsed into one set of "options" to the
ELF Writer".

-- Sean Silva

It is more than just defining symbols. There are many other directives that
have command line option equivalents that are used to setup linking. You can pull symbols
with EXTERN, add other files to link with INPUT, add groups of archives to be searched
with GROUP, name the output file with OUTPUT, add new library path directories
with SEARCH_DIR, etc…

Also keep in mind that that linker scripts are usually "inlined" with other command line
options. For example:

   ld foo.o --defsym=x=12 -Lbar -T beagle-ram.ld -lbaz bar.o

As you said, it makes sense to parse it during command line argument parsing.

Hi Sean,

The hook to add symbols into the Reader is through Writer->addFiles function. I think still the linker script should be parsed by Writer.

Also the linker script functionality is only needed by ELF and not anything else. It should be contained only within ELF.

Thanks

Shankar Easwaran

This is different. There are things in the command line which would

a) add symbols into the output
     This is generic functionality which is needed by all the platforms. The symbols could be parsed and handed over to ELF Writer to add symbols, because types of symbols on what to add should be owned by the target specific format.

b) specify segment address by using (-Ttext=<n>)
     This functionality is only needed by ELF and should be contained in WriterELF.

Overall, I think that the functionality should be accessed only by WriterELF and contained within.

Nick, Michael : What do you think ?

Thanks

Shankar Easwaran

I agree with your point about encapsulation and separation of
concerns. However, I believe that GNU ld will accept and use linker
scripts even for the non-ELF formats it supports (not that we intend
to emulate that behavior necessarily), so they aren't ELF-specific per
se. Thus I feel that the right categorization might be "GNU ld"-only,
and hence it would naturally be just a component of the GNU ld
frontend, where it would be isolated from the rest of LLD.

-- Sean Silva

I dont think any other format has the functionality of sections and
segments with segment and section permissions etc.

Are you planning to mimic 1-1 lld to ld linker script functionality ?

Do you by any chance have a proposal of how the lld linker script would
look if there is a difference b/w ld and lld linker scripts ?

Maybe the first version could be just ELF centric.

Thanks

Shankar Easwaran

I agree that processing of linker scripts is a "flavor" issue. We have an in-house linker that processes them and has a different command line than GNU ld, so we'll want to process them in that "flavor" as we'll.

I dont think any other format has the functionality of sections and
segments with segment and section permissions etc.

Are you planning to mimic 1-1 lld to ld linker script functionality ?

No, there's lots of arcane things that GNU ld has (like compatibility
with a completely different linker script language). Pragmatically it
makes sense to at least support enough to link Linux.

Do you by any chance have a proposal of how the lld linker script would
look if there is a difference b/w ld and lld linker scripts ?

I don't see any reason to support anything except a strict subset of
ld's language. What kind of difference are you thinking of?

Maybe the first version could be just ELF centric.

The consensus in this thread seems to be that it should be in the
frontend. Nick pointed out that since linker scripts can define
symbols, they need to be handled before the writer; I don't see any
way to avoid that. Could you elaborate on the concrete reasons why you
think it should be in the ELF backend?

-- Sean Silva

Is it the same language that GNU ld accepts?

-- Sean Silva

AFAIK. I haven't audited it for differences. With any luck, this means we might have some test cases. I'll look into it.

We need to be careful about what we mean by "frontend" and "backend"
of lld. The Writer (backend?) actually gets a chance to contribute
atoms along with the Reader atoms which are fed to the Resolver.

So the linker script code could live in WriterELF and contribute atoms
for symbols defined by a linker script.

That being said, some of the functionality of linker scripts is done via
command line options on other platforms, so we want to share common
functionality were possible. The linker script parsing may wind up
being a thin layer (in WriterELF) on top of some shared functionality.

-Nick

We need to be careful about what we mean by "frontend" and "backend"
of lld.

Yeah, that terminology was really vague. I identify "frontend" with
the driver (GNU ld, link.exe, ld64, etc) and "backend" with
ReaderWriter (i.e. roughly the object file formats).

The Writer (backend?) actually gets a chance to contribute
atoms along with the Reader atoms which are fed to the Resolver.

Could you elaborate on this? My understanding is that ReaderWriter is
supposed to be basically a toolkit for emitting the object files
("mechanism"), while the different drivers use those APIs (along with
the Atom core linking model "mechanism") to emulate their respective
linkers (the "policy").

So the linker script code could live in WriterELF and contribute atoms
for symbols defined by a linker script.

That being said, some of the functionality of linker scripts is done via
command line options on other platforms, so we want to share common
functionality were possible. The linker script parsing may wind up
being a thin layer (in WriterELF) on top of some shared functionality.

Ok.

-- Sean Silva

I am still getting up to speed with the LLD architecture so maybe I am
missing something, but that seems like a odd place to put it. I would
think that a Writer would only be focused on serializing atoms to their
respective format.

Since the linker script is an input it would seem more natural to me
to have some sort of linker script Reader that can contribute symbol
definitions, external symbols, new libraries, library search paths,
etc... (to process LD script directives like PROVIDE, EXTERN, INPUT,
GROUP, SEARCH_DIR, and OUTPUT).