LLD use cases and design meeting and discussion

Greetings folks,

Brief summary:

  • Most of the active and interested parties in LLD sat down to discuss where its going
  • Many of the design misunderstandings (on all sides) were clarified, and should make their way into documentation soon
  • The various contributors have constructive paths forward, and the patches and details about that will flow to the list as usual

If that’s all you have time for, don’t worry, you’ll see all the results of this meeting repeated on the lists and in patches and code reviews just like usual. The rest is for those curious and particularly interested, or to serve as a record for those who attended.

Earlier today, a bunch of the folks who are, have been, or have expressed a particular interest in contributing to LLD got together in person to try to exchange a bunch of ideas more rapidly than you realistically can in email and discuss where we were seeing LLD go, why, and how it should progress there.

I’m writing this email to share with the entire community (as we’re of course aware not everyone could join in the meeting feasibly) the highlights of the discussion, and the specific plans that various contributors to LLD made based on the discussion. We want anyone to feel free to jump in, disagree, etc on the list, and of course any actual changes will be discussed on their own as usual. Note that this summary is from my own memory and notes, but will in may cases be relaying things from the entire group that met rather than any particular view of my own, and I’ve CC-ed those attending so they can correct anywhere I mess up.

The first subject we discussed were the use cases and features that the various parties had for LLD. Essentially, what do we need, what do we want, and some background of why. I couldn’t begin to capture all of it, but below are a list of interesting and informative examples. While not all users or all targets of LLD would necessarily want all of these features, these are things that design decisions in LLD should bear in mind. For example, these should help the design of LLD not paint itself into a corner (too much). However, there is no priority or ranking here, nor any commitment that these are definite things on anyone’s roadmap.

Interested features / use cases:

  • Optimizing the resulting binary
  • GC-ing dead code
  • Shrinking the resulting binary through packing or other tricks
  • Programming language derived or semantic optimizations of layout
  • Laying out code to more efficiently {execute,load,etc}
  • ICF
  • Optimizing calls or indirections between functions / libraries based on link structure
  • Bug finding and/or analysis features
  • ODR violation detection
  • Library layering checking
  • Semantic or programming language checking
  • Verification or security hardening
  • Compatibility
  • Drop-in command-line compatibility with linkers on all platforms
  • Link-time performance parity with existing platform linkers
  • Support for existing ABI constructs, synthesizing binary components, etc
  • Support for existing extremely wide variance of symbol resolution strategies on different platforms
  • Ability to achieve extensions like these via a plugin
  • Rigorous testing methodology of all platforms
  • Significant link-time improvements on some platforms where link-times are a major pain point
  • Support changes to standard build / development workflow
  • Linker servers or persistent linking process
  • Embedded usage of linker within other application via APIs
  • Different build system integration points that traditional steps
  • Toolkit for building linking related functionality
  • Potentially re-usable logic from JIT contexts
  • Potential to decouple input/output file formats for particular platform(s)
  • Manipulation of debug info (IE, more than just relocations)
  • Object files controlling linker behavior via embedded flags or other mechanisms

A closely related discussion was around what the right division between LLD and LLVM was. There was broad consensus that LLD should provide the libraries that handle everything necessary for linking, but not useful in other tools. Everything necessary for linking but that is useful in other tools should sink into the LLVM core infrastructure. An example of this today would be a write API for libObject.

There was some further discussion about compatibility and the degree of compatibility needs of different users of LLD. Among those in the room, there were both users who would need initial compatibility with a system linker, but would not likely need long-term compatibility layers, and those for whom very long term (many years) compatibility would be a critical feature. Essentially, we need to support both scenarios.

The final topic of significant discussion centered around whether there should be a single core linking model in LLD, shared between all the platforms and/or targets. There was little disagreement about having a common model being useful and beneficial, but we currently don’t have a single model that adequately addresses the kinds of use cases outlined above (specifically, there is a tension between link-time performance and feature support on all platforms).

While not ideal, there was general willingness for LLD to provide infrastructure supporting and platform support using multiple different linking models if that is necessary to support the features (both functional and performance) desired. However, there was also strong interest in continually looking for opportunities to share basic code and infrastructure, and even to collapse to a single model if one proves sufficient for the use cases.

With that, several participants volunteered to take on specific tasks that I just want to mention here for completeness. The results of these will of course float into the mailing lists as usual.

  1. I promised to write up this summary to the list as host. =] Done!

  2. I have asked some of the contributors to LLD to ensure that the most relevant parts of the above are actually incorporated into the documentation. That will of course have its own patch review for detailed discussion.

  3. Rui plans to continue to clean up and complete his prototype COFF targeting code, in particular to make sure that the infrastructure it uses is factored suitably to be re-used to target another target (if useful) and that it is factored into a library with reasonable APIs (common APIs where we can define them, and platform specific where necessary) to form a reusable toolkit.

  4. Lang is going to look into the specific linking model in Rui’s prototype to see whether it would actually be able to support the use cases and features that Darwin needs (and gets today with the Atom model). If it does, we already have a path for converging on a single model. If it doesn’t, we will need to work to design APIs and library boundaries in a sensible way to support divergent linking models between target platforms.

  5. If #4 proves to work, then everyone seemed interested in working to systematically port. If #4 doesn’t work, everyone is interested in working to refactor the current libraries and the prototype so that the interface boundaries and such all make sense again.

I hope that I’ve captured everything, both so that particularly interested parties can chime in on any of the points, and so that those attending can refer back to this where necessary.

Thanks to everyone who took the considerable time (and in a remarkably busy week for some!) to try to work through a lot of these issues.
-Chandler

It would have been nice to know in advance about this, but alas.
FWIW, For the past week I've been working on a rewrite of the ELF
backend to be chunk based.
It's still in a preliminary stage but maybe we should coordinate to
duplicate effort on this side.

It would have been nice to know in advance about this, but alas.

Sorry, it was informally organized by a few people around specific side conversations. We knew that some folks who would be very interested weren’t present, and that’s why there was nothing decided, why I committed to relay as complete of a summary of the discussion as possible here, and why every actual change or subsequent step will end up on the mailing list like any other change.

Please, share any and all thoughts you have on this subject here on the list. Everyone who was present in person is directly CC-ed to make sure that those thoughts from folks who happened to not be present are not lost at all.

FWIW, For the past week I’ve been working on a rewrite of the ELF
backend to be chunk based.

This sounds fantastic.

It’s still in a preliminary stage but maybe we should coordinate to
duplicate effort on this side.

Actually, I don’t think this will be duplicated at all. I don’t think anyone has started on the ELF side here, and so this will just get us yet another data point to examine to understand whether the chunk model is actually common or specific to certain platforms / feature sets.

-Chandler

David,

I started a discussion yesterday on lld / chunks for ELF as we need relocations to be read when reading the inputs especially for handling comdat.

The other way thus can be done is doing symbol resolution while reading which makes the linker less suitable for concurrent operations.

Let me know what you think?

Shankar Easwaran

I have a tiny bit of input, but I'm not sure this is the right discussion for it, so ignore me if it is not. Is LLD responsible for producing the final executable file, in whatever format (ELF, Mach-O, dylib, etc.)?

If so, one use I have is to produce "raw" executables. That is, I want to produce a file of bytes suitable for directly flashing onto tiny MCUs that execute code starting at the first byte in their flash.

It may be this request is for such a fundamental feature that it's pedantic to request it, and if so, please ignore me. But this seems to be a missing link in clang/LLVM that was only filled with binutils. Will I be able to get away from binutils altogether with LLD? Perhaps I already can today?

Thanks,

I have a tiny bit of input, but I'm not sure this is the right
discussion for it, so ignore me if it is not. Is LLD responsible
for producing the final executable file, in whatever format (ELF,
Mach-O, dylib, etc.)?

Yes.

If so, one use I have is to produce "raw" executables. That is, I
want to produce a file of bytes suitable for directly flashing onto
tiny MCUs that execute code starting at the first byte in their flash.

With Binutils this is normally done use 'objcopy'. AFAIK, LLVM does not
have an 'objcopy' equivalent yet.

It may be this request is for such a fundamental feature that it's
pedantic to request it, and if so, please ignore me. But this seems
to be a missing link in clang/LLVM that was only filled with binutils.
Will I be able to get away from binutils altogether with LLD?

LLD should eventually cover the GNU LD part (and more). There are
several other "utils" they may not have an replacement:

  * http://marshall.calepin.co/binutils-replacements-for-llvm.html

Not sure how up to date that is.

Perhaps I already can today?

Probably not, but I think that depends on what you are using from Binutils.