RFC: Revisiting LLD-as-a-library design

Hey all,

Long ago, the LLD project contributors decided that they weren’t going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn’t done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.

However, it is now ${YEAR} 2021, and I think we ought to reconsider this design decision. LLD was a great success: it works, it is fast, it is simple, many users have adopted it, it has many ports (COFF/ELF/mingw/wasm/new MachO). Today, we have actual users who want to run the linker as a library, and they aren’t satisfied with the option of launching a child process. Some users are interested in process reuse as a performance optimization, some are including the linker in the frontend. Who knows. I try not to pre-judge any of these efforts, I think we should do what we can to enable experimentation.

So, concretely, what could change? The main points of reusability are:

  • Fatal errors and warnings exit the process without returning control to the caller

  • Conflicts over global variables between threads

Error recovery is the big imposition here. To avoid a giant rewrite of all error handling code in LLD, I think we should avoid returning failure via the llvm::Error class or std::error_code. We should instead use an approach more like clang, where diagnostics are delivered to a diagnostic consumer on the side. The success of the link is determined by whether any errors were reported. Functions may return a simple success boolean in cases where higher level functions need to exit early. This has worked reasonably well for clang. The main failure mode here is that we miss an error check, and crash or report useless follow-on errors after an error that would normally have been fatal.

Another motivation for all of this is increasing the use of parallelism in LLD. Emitting errors in parallel from threads and then exiting the process is risky business. A new diagnostic context or consumer could make this more reliable. MLIR has this issue as well, and I believe they use this pattern. They use some kind of thread shard index to order the diagnostics, LLD could do the same.

Finally, we’d work to eliminate globals. I think this is mainly a small matter of programming (SMOP) and doesn’t need much discussion, although the make template presents interesting challenges.

Thoughts? Tomatoes? Flowers? I apologize for the lack of context links to the original discussions. It takes more time than I have to dig those up.

Reid

A big +1 from our side since we have a potential use case for LLD-as-a-library (I was going to write a similar RFC but you beat me to it).

+1 generally.

Petr - if you have particular context for your use case, might be handy to have as reference/motivation.
Reid - if you have any particular use case of your own in mind, or links to other discussions/users who are having friction with the current state of affairs, would be hand to have.

(though I don’t generally want the thread to become about picking apart those use cases - library-based design and flexibility is a fairly core tenant of the LLVM project overall regardless of the validity of specific use cases)

Throwing flowers on behalf of the Zig project :sunflower::wilted_flower: :slight_smile:

Currently Zig embeds LLD with a renamed main() and then exposes `zig ld.lld`, `zig ld64.lld`, `zig lld-link`, and `zig wasm-ld` which call into this renamed main() function. Then when Zig wants to invoke LLD, it executes itself as a child process, using one of these sub-commands.

LLD-as-a-library would remove the need for this trick, improving performance especially on systems such as Windows which have a high cost of spawning child processes.

It also has the possibility to improve error reporting, avoiding the need for parsing the stderr of LLD. Errors could be communicated in a more semantic way, with less room for bugs.

With our upcoming self-hosted backend, compile times are on order of milliseconds, and so child process spawning times in order to invoke LLD end up being a substantial portion of total compilation time. That said, we are also entering the linking space to provide an alternative to LLD, but it will be some time before any possibility of removing a dependency on LLD, so this RFC would be greatly beneficial to the Zig project.

Cheers,
Andrew

Hey all,

Long ago, the LLD project contributors decided that they weren't going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn't done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.

I think it would be great to move in this directions. It's a little bit unclear
at the moment whether or not the library use case is supported, because we
do include headers and libraries in the install targets.

As a package maintainer my wish list for an LLD library is:

1. Single shared object: https://reviews.llvm.org/D85278
2. All symbols are assumed private unless explicitly given library visibility.
3. Some subset of the API that is stable across major releases.

-Tom

Hey all,

Long ago, the LLD project contributors decided that they weren’t going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn’t done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.

I think it would be great to move in this directions. It’s a little bit unclear
at the moment whether or not the library use case is supported, because we
do include headers and libraries in the install targets.

As a package maintainer my wish list for an LLD library is:

  1. Single shared object: https://reviews.llvm.org/D85278
  2. All symbols are assumed private unless explicitly given library visibility.

Is this ^ consistent with how other parts of LLVM are handled? My understanding was generally LLVM’s API is wide/unbounded and not stable. I’d hesitate to restrict future libraries in some way due to some benefits that provides - ease of refactoring (LLVM’s ability to be changed frequently is very valuable).

  1. Some subset of the API that is stable across major releases.

A limited stable C API seems plausible to me, if there’s need.

  • Dave

     > Hey all,
     >
     > Long ago, the LLD project contributors decided that they weren't going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn't done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.
     >

    I think it would be great to move in this directions. It's a little bit unclear
    at the moment whether or not the library use case is supported, because we
    do include headers and libraries in the install targets.

    As a package maintainer my wish list for an LLD library is:

    1. Single shared object: https://reviews.llvm.org/D85278
    2. All symbols are assumed private unless explicitly given library visibility.

Is this ^ consistent with how other parts of LLVM are handled? My understanding was generally LLVM's API is wide/unbounded and not stable. I'd hesitate to restrict future libraries in some way due to some benefits that provides - ease of refactoring (LLVM's ability to be changed frequently is very valuable).

The single shared object is consistent with clang and llvm, but
not the private symbols by default. We have discussed changing
this in clang and llvm, though, and for a library like lld that is
smaller than the clang and llvm libraries, it seems like it would
be an easier task and something that would be useful to do right
from the start.

-Tom

thoughts? Tomatoes? Flowers? I apologize for the lack of context links to the original discussions. It takes more time than I have to dig those up.

+1

- Fatal errors and warnings exit the process without returning control to the caller

This means every single fatal() call needs scrutiny. The function is
noreturn and there are 147 references.
In many places returning from fatal() can indeed crash.

Another motivation for all of this is increasing the use of parallelism in LLD. Emitting errors in parallel from threads and then exiting the process is risky business. A new diagnostic context or consumer could make this more reliable. MLIR has this issue as well, and I believe they use this pattern. They use some kind of thread shard index to order the diagnostics, LLD could do the same.

Yes, I remember that I refrained from warn() in some parallel*,
because warn() becomes error() in --fatal-warnings mode and error() is
similar to fatal() after --error-limit is reached.

>
> > Hey all,
> >
> > Long ago, the LLD project contributors decided that they weren't going to design LLD as a library, which stands in opposition to the way that the rest of LLVM strives to be a reusable library. Part of the reasoning was that, at the time, LLD wasn't done yet, and the top priority was to finish making LLD a fast, useful, usable product. If sacrificing reusability helped LLD achieve its project goals, the contributors at the time felt that was the right tradeoff, and that carried the day.
> >
>
> I think it would be great to move in this directions. It's a little bit unclear
> at the moment whether or not the library use case is supported, because we
> do include headers and libraries in the install targets.
>
> As a package maintainer my wish list for an LLD library is:
>
> 1. Single shared object: ⚙ D85278 [lld] Support building shared libLLD.so
> 2. All symbols are assumed private unless explicitly given library visibility.
>
>
> Is this ^ consistent with how other parts of LLVM are handled? My understanding was generally LLVM's API is wide/unbounded and not stable. I'd hesitate to restrict future libraries in some way due to some benefits that provides - ease of refactoring (LLVM's ability to be changed frequently is very valuable).
>

The single shared object is consistent with clang and llvm, but
not the private symbols by default. We have discussed changing
this in clang and llvm, though, and for a library like lld that is
smaller than the clang and llvm libraries, it seems like it would
be an easier task and something that would be useful to do right
from the start.

I support that we compile lld source files with -fvisibility=hidden on
ELF systems.

lld::*::link may be the only API which need LLVM_EXTERNAL_VISIBILITY
(i.e. default visibility for ELF)
(LLVM_EXTERNAL_VISIBILITY is defined in llvm/include/llvm/Support/Compiler.h
Windows doesn't customize it currently.)

Bug +1 from me, too.

The reasons of the past were clear, and they helped make the project a success. Kudos to all involved.

The current reasons are also clear, especially error handling and reuse of infrastructure.

Cheers,
Renato

Sure, I’m thinking about porting LLVM to Fuchsia and our process creation is not (and might never be) as fast as on *NIX, it’s actually closer to Windows, so I’m interested in integrating LLD into Clang to reduce the cost (similarly to what Zig does).

More generally, I’d also like to explore alternative models like llvm-buildozer or a client-server model similar to what Daniel described in his talk, and having LLD-as-a-library makes that easier.

No objections here (although I don’t have a specific use-case currently).

Regarding the error handling, I support some sort of callback approach to report the errors (https://www.youtube.com/watch?v=YSEY4pg1YB0). This doesn’t solve the problem of what to do after a fatal error has been reported. In the debug line parsing code which inspired that talk, we had a concept of unrecoverable and recoverable errors, whereby the parser would either stop parsing if it found something it couldn’t recover from, by bailing out of the function, or it would set some assumed values and continue parsing. This may work for some cases in LLD, but the fatal cases need to stop the linking completely, so we’ll need some way to bail out of the LLD call stack in those cases still somehow - personally, I think we should use llvm::Error for that up to the point of public interface with the library, to avoid the failure being unchecked. The error callbacks could then return Error to allow a client to force LLD to stop, even if the error would otherwise be non-fatal.

James

One of the earliest discussions about the LLD as a library design, at least after it had matured enough to be practical was this rather long thread https://lists.llvm.org/pipermail/llvm-dev/2016-December/107981.html

I don’t have any objections about making LLD more useable as a library.

What I would say is that we should come up with a good idea of what the functionality needed from the library is. For example I can see one use case with a relatively coarse interface that is similar to running LLD from the command line, with object files passed in memory buffers. I can see that working as an extension of the existing design. A more ambitious use case would permit more fine grain control of the pipeline, or construct LLD data structures directly rather than using object files could require quite a bit more work. I think people are thinking along the lines of the former, but it is worth making sure.

I think one of the reasons the library use case faltered was that there was no-one with a use case that was able to spend enough time to make it happen. The existing maintainers had enough work to do to catch up with Gold and BFD. Do we have someone willing to do the work?

Peter

One other point to consider for library usage is memory management.

I don’t know the state of LLD today, but I think it was purposefully leaking memory as a tradeoff for performance at some point in the past (why deallocating everything before exiting main as the system will take care of this).

This presentation even says:

What makes LLD so fast? - Previous FOSDEM Editionshttps://archive.fosdem.org › llvm_lld › slides › W… <https://archive.fosdem.org/2019/schedule/event/llvm_lld/attachments/slides/3423/export/events/attachments/llvm_lld/slides/3423/WhatMakesLLDSoFastPresenterNotes.>

“ • Large amounts of memory allocated up front and never released. »

That can be controlled, see https://github.com/llvm/llvm-project/blob/a845dc1e562c20db54018a121eb01970e76602db/lld/ELF/Driver.cpp#L127

One of the earliest discussions about the LLD as a library design, at least after it had matured enough to be practical was this rather long thread https://lists.llvm.org/pipermail/llvm-dev/2016-December/107981.html

I don’t have any objections about making LLD more useable as a library.

What I would say is that we should come up with a good idea of what the functionality needed from the library is. For example I can see one use case with a relatively coarse interface that is similar to running LLD from the command line, with object files passed in memory buffers. I can see that working as an extension of the existing design. A more ambitious use case would permit more fine grain control of the pipeline, or construct LLD data structures directly rather than using object files could require quite a bit more work. I think people are thinking along the lines of the former, but it is worth making sure.

Not sure it’s super important to decide much of that up-front. Once the general philosophy is agreed upon and the most fundamental issues are addressed for even the narrowest/simplest library use - other things can be done as enhancements beyond that.

A stretch/eventual goal could be to share some code between lld and LLVM’s ORC JITLink - so that JIT and AOT linking share more code/diverge less.

As one of the people working on the new Mach-O backend, my main concerns are:

  1. The Mach-O backend is still very much in flux, and will likely remain so until the end of this year. Whoever undertakes this library-fication should sync up with us to avoid e.g. awkward merge conflicts.``
  2. Performance is very important to us, and this library-fication should not regress it. Right now, we don’t have a good benchmarking service set up (the LNT server only benchmarks LLD-ELF); we’ve been mostly just profiling things locally. I can send these benchmarking instructions to whomever takes on this effort.

Jez

Adding Erik (not subscribed) who has previously had issues with LLD not being a library to provide some additional use cases.

Hello all,
Do I understand correctly lld is binary linker? Does it take arch/os specific or llvm-specific object files as input or both?

Im very young devel llvm- and clang-wise albeit if libraryfication of lld doesnt involve too heavy algorithmical and architectural changes and someone can intro me on working with git (which branch to start from on which branch to work whether to create separate branch and how), i could try to prepare working prototype of this solution. Please note though im still learning.

For macho code, we could use samples in testsuite where I could make sure nothing breaks after my changes. We could use some such samples too for lto. In perfect world, we could use some testset too for each arch/os combo.

As on avoiding merge conflicts with current devel including macho code, id suggest finnishing work there and stabilizing lld first, then when no new code is changed heavily or introduced anymore, i could try to focus on architecture rework safely.

As on unification with jit or lto as was mentioned, i dont know enough about this task yet.

As on avoiding/fixing memory leaks between compilation units etc, i think i could try to fix those.

Please note though I still dont have clear view as on how to make lld process more parallel so I may be not yet usable for this project.

Best regards,
Pawel

sob., 12.06.2021, 05:20 użytkownik Michael Spencer via llvm-dev <llvm-dev@lists.llvm.org> napisał:

I use LLVM to compile WebAssembly to native code. The primary use-case for this is compiling WASM plugins for games - this is what Microsoft Flight Simulator 2020 uses it for. Using the system linker is not an option on Windows, which does not ship link.exe by default, making LLD a mandatory requirement if you are using LLVM in any kind of end-user plugin scenario, as the average user has not installed Visual Studio.

This puts users of LLVM’s library capabilities on windows in an awkward position, because in order to use LLVM as a library when compiling a plugin, one must use LLD, which cannot be used as a library. My current solution is to use LLD as a library anyway and maintain a fork of LLVM with the various global cleanup bugs patched (most of which have now made it into stable), along with a helper function that allows me to use LLD to read out the symbols of a given shared library (which is used to perform link-time validation of webassembly modules, because LLD makes it difficult to access any errors that happen).

If LLD wanted to become an actual library, I think it would need a better method of reporting errors than simply an stdout and stderr stream, although I don’t know what this would look like. It would also be nice for it to expose the different link stages like LLVM does so that the application has a bit more control over what’s going on. However, I don’t really have any concrete ideas about what LLD should look like as a library, only that I would like it to stop crashing when I attempt to use it as one.

To second what Jez said here: the new Mach-O backend is still pretty young (e.g. it wasn’t even the default Mach-O backend in LLVM 12). It’s being actively developed, and there’s still the possibility of fairly invasive changes being required to support new features or as we gain more implementation experience.

One of our main motivations for LLD for Mach-O was to improve incremental build speeds for our developers. We’ve found that link speed is a major component of incremental build times; LLD’s speed is a huge win there, and we care a lot about maintaining that speed. I’m CCing Nico, since he’s also been actively benchmarking LLD for Mach-O against Chromium builds (and reporting and fixing speed regressions).

Reid’s proposals (better error handling and eliminating globals) are completely reasonable. In general though, I really appreciate the LLD codebase’s simplicity and think it’s very valuable for understanding and maintaining the code. I haven’t worked too much with the rest of LLVM or Clang, so I’m not at all trying to compare LLD with them or cast aspersions on those codebases; I’m just speaking from my personal perspective. For example, having (mostly) separate codebases for each LLD port set off my “code duplication” spidey senses when I first started working with LLD, but while it does lead to some amount of duplication, there’s also a lot of subtle behavior differences between platforms, and having some amount of duplication is IMO a better tradeoff than e.g. having common functions that have a bunch of conditionals for each platform, or trying to come up with common abstract interfaces that are specialized for each platform, or so on. I really hope we can maintain that simplicity to whatever extent possible.

Thanks,

Shoaib