lld: ELF/COFF main() interface

In the process of migrating from old lld ELF linker to new (previously ELF2) I noticed the interface lost several important features (ordered by importance for my use case):

  1. Detecting errors in the first place. New linker seems to call exit(1) for any error.

  2. Reporting messages to non-stderr outputs. Previously all link functions had a raw_ostream argument so it was possible to delay the error output, aggregate it for multiple linked files, output via a different format, etc.

  3. Linking multiple outputs in parallel (useful for test drivers) in a single process. Not really an interface issue but there are at least two global pointers (Config & Driver) that refer to stack variables and are used in various places in the code.

All of this seems to indicate a departure from the linker being useable as a library. To maintain the previous behavior you’d have to use a linker binary & popen.

Is this a conscious design decision or a temporary limitation?

If it’s a limitation, how would one go about fixing this? I’m not too familiar with the idiomatical error handling in LLVM. Normally in this situation I’d just throw from error() but lld is probably compiled without exceptions. Is ErrorOr the established practice? How does LLVM generally deal with error handling for parsers (like linker script parser) where it’s a lot of mutually recursive functions where every single one can fail?

Arseny

In the process of migrating from old lld ELF linker to new (previously
ELF2) I noticed the interface lost several important features (ordered by
importance for my use case):

1. Detecting errors in the first place. New linker seems to call exit(1)
for any error.

2. Reporting messages to non-stderr outputs. Previously all link functions
had a raw_ostream argument so it was possible to delay the error output,
aggregate it for multiple linked files, output via a different format, etc.

3. Linking multiple outputs in parallel (useful for test drivers) in a
single process. Not really an interface issue but there are at least two
global pointers (Config & Driver) that refer to stack variables and are
used in various places in the code.

All of this seems to indicate a departure from the linker being useable as
a library. To maintain the previous behavior you'd have to use a linker
binary & popen.

Is this a conscious design decision or a temporary limitation?

That the new ELF and COFF linkers are designed as commands instead of
libraries is very much an intended design change.

If it's a limitation, how would one go about fixing this? I'm not too
familiar with the idiomatical error handling in LLVM. Normally in this
situation I'd just throw from error() but lld is probably compiled without
exceptions. Is ErrorOr the established practice? How does LLVM generally
deal with error handling for parsers (like linker script parser) where it's
a lot of mutually recursive functions where every single one can fail?

Since it's as designed, please run the linker as an external command using
fork/exec (or wrappers of them) instead of trying to use that inside the
same process.

This is really unfortunate.

I’ve read the discussion threads for the atom/chunk controversy and I feel like I understand the reasons for rewriting the linker, but this does not seem to have anything to do with whether the linker is usable as a library or not.

As it stands, not only does lld have two completely different linkers (I’m treating COFF/ELF2 as one since they are really two different implementations of the same concept, AFAIU), but one is usable as a library (and even does not require round-tripping generated code through an object file! I was really happy to use that) and the other one isn’t. Not sure what the future plans are for Mach-O linker (at this point it seems logical to rewrite that using the new designs but I’m not sure if it ever happens), so maybe at some point we’ll just have one linker application instead of a library and an application.

Anyway, since linker is the only missing piece for full compilation stack (source language to runnable executable), it’s sad to see this specific part of LLVM not working as a library when everything else does.

Are there specific concerns in terms of implementation that prevent new lld from being a library? I understand that using global variables and error() function is simpler, but the rest of LLVM does not do that and the codebase there is significantly larger. Am I missing any other issues except for the ones I mentioned in my original e-mail that will come up in a library-like usage scenario?

Arseny

Designing it a command makes things simpler because you can safely assume that most functions always success, or otherwise the entire process terminates. You can defer the operating system to clean up all resources that was used by the process however it fails (with a few exceptions such as temporary files.) I actually like to use the linker as an external command since the operating system provides good isolation between my process and the linker which could fail by an unknown bug or something. Rewriting all of them as functions that return ErrorOr is technically doable, but it needs strong justification, so I guess that unlikely to change.

Why do you want to use that as a library? I don’t think “because LLVM and Clang allow that” is not a compelling argument since the linker and the compiler are pretty different programs. I clearly see many reasons to use LLVM and part of Clang as libraries, but they are not directly applicable to LLD. The new linker is more like a command which is built on top of the libraries that LLVM provides.

+Lang for discussions of lld-as-a-library

Lang & I have discussed some uses of the linker as a library for reuse within the LLVM JIT, for example, which is being phrased more and more like an in-process version of traditional ahead-of-time compilation (which helps in a number of ways, eg: by having fewer oddities/quirks from a model people are more familiar with)

(as for the cleanup issue - LLVM has ways of disabling explicit cleanup when it knows its in-process and is going to go away soon, I think - similar things could be done in LLD)

There are lots of good reasons to have it as a library ranging from embedding it in processes as an in-process linker (as Dave mentions later in thread) to making it more implicitly testable, to being able to do complete build to executable/library/etc with a single command (this goes back to one, but is a more specific example). There are many good reasons - some of which are being worked on with a library based lld as a prerequisite to have lld be a library based linker.

Rewriting every function as ErrorOr sounds terrible and we should avoid that as much as possible, but keeping the general llvm style of “library first” seems to be an important use case.

-eric

I disagree.

During the discussion, there was a specific discussion of both the new COFF port and ELF port continuing to be libraries with a common command line driver.

If you want to consider changing that, we should have a fresh (and broad) discussion, but it goes pretty firmly against the design of the entire LLVM project. I also don’t really understand why it would be beneficial.

In the process of migrating from old lld ELF linker to new (previously
ELF2) I noticed the interface lost several important features (ordered by
importance for my use case):

1. Detecting errors in the first place. New linker seems to call exit(1)
for any error.

2. Reporting messages to non-stderr outputs. Previously all link
functions had a raw_ostream argument so it was possible to delay the error
output, aggregate it for multiple linked files, output via a different
format, etc.

3. Linking multiple outputs in parallel (useful for test drivers) in a
single process. Not really an interface issue but there are at least two
global pointers (Config & Driver) that refer to stack variables and are
used in various places in the code.

All of this seems to indicate a departure from the linker being useable
as a library. To maintain the previous behavior you'd have to use a linker
binary & popen.

Is this a conscious design decision or a temporary limitation?

That the new ELF and COFF linkers are designed as commands instead of
libraries is very much an intended design change.

I disagree.

During the discussion, there was a *specific* discussion of both the new
COFF port and ELF port continuing to be libraries with a common command
line driver.

There was a discussion that we would keep the same entry point for the old
and the new, but I don't remember if I promised that we were going to
organize the new linker as a library. The new one is designed as a command
from day one. (Precisely speaking, the original code propagates errors all
the way up to the entry point, so you can call it and expect it to always
return. Rafael introduced error() function later and we now depends on that
function does not return.)

If you want to consider changing that, we should have a fresh (and broad)

discussion, but it goes pretty firmly against the design of the entire LLVM
project. I also don't really understand why it would be beneficial.

I'm not against organizing it as a library as long as it does not make
things too complicated, and I guess reorganizing the existing code as a
library is relatively easy because it's still pretty small, but I don't
really want to focus on that until it becomes usable as an alternative to
GNU ld or gold. I want to focus on the linker features themselves at this
moment. Once it's complete, it becomes more clear how to organize it.

There are lots of good reasons to have it as a library ranging from
embedding it in processes as an in-process linker (as Dave mentions later
in thread) to making it more implicitly testable, to being able to do
complete build to executable/library/etc with a single command (this goes
back to one, but is a more specific example). There are many good reasons -
some of which are being worked on with a library based lld as a
prerequisite to have lld be a library based linker.

Rewriting every function as ErrorOr sounds terrible and we should avoid
that as much as possible, but keeping the general llvm style of "library
first" seems to be an important use case.

Do you have any idea to avoid ErrorOr? We do not have a pass to ensure that
inputs are correct (because such pass would be slow). Most data is read
on-demand, so almost everything can fail.

Ok, myself and essentially everyone else thought this was clear. If it isn’t lets clarify:

I think it is absolutely critical and important that LLD’s architecture remain one where all functionality is available as a library. This is the design goal of LLVM and all of LLVM’s infrastructure. This applies just as much to LLD as it does to Clang.

You say that it isn’t compelling to match Clang’s design, but in fact it is. You would need an overwhelming argument to diverge from Clang’s design.

The fact that it makes the design more challenging is not compelling at all. Yes, building libraries that can be re-used and making the binary calling it equally efficient is more challenging, but that is the express mission of LLVM and every project within it.

I think this last was a mistake.

The fact that the code propagates errors all the way up is fine, and even good. We don’t necessarily need to be able to recover from link errors and try some other path.

But we absolutely need the design to be a library that can be embedded into other programs and tools. I can’t even begin to count the use cases for this.

So please, let’s go back to where we do not rely on never-returning error handling. That is an absolute mistake.

I am certain that it will make things more complicated, but that is the technical challenge that we must overcome. It will be hard, but I am absolutely confident it is possible to have an elegant library design here. It may not be as simple as a pure command line tool, but it will be dramatically more powerful, general, and broadly applicable.

The design of LLVM is not the simplest way to build a compiler. But it is valuable to all of those working on it precisely because of this flexibility imparted by its library oriented design. This is absolutely not something that we should lose from the linker.

Ok, now we’re talking about something totally reasonable.

If it is easier for you all to develop this first as a command line tool, and then make it work as a library, sure, go for it. You’re doing the work, I can hardly tell you how to go about it. ;]

However, I think it is super important to be clear that getting the library architecture is a hard requirement for the LLD project. Without that, it doesn’t even make sense as part of LLVM.

And as a consequence, I think it is unacceptable to replace the old ELF port with the new one until this is true. That is removing functionality that users of LLD realistically were depending on, which you’re seeing in this thread. That’s not cool. We don’t really tolerate dramatic regressions in functionality like this, and even if we’ve already done it, we should revert back to a state where both are available until the new port is actually ready. And ready in LLVM land means, functional as a library.

-Chandler

Why do you want to use that as a library? I don’t think “because LLVM and Clang allow that” is not a compelling argument since the linker and the compiler are pretty different programs. I clearly see many reasons to use LLVM and part of Clang as libraries, but they are not directly applicable to LLD. The new linker is more like a command which is built on top of the libraries that LLVM provides.

My specific use-case is that of a compiler with an in-process linker that also runs tests (like Eric mentioned) - so basically a complete ahead-of-time compilation suite in a single process. In my case I think it’s not prohibitively expensive to invoke a process - it complicates things but does not make them impossible. I started down this path because LLVM and lld allowed that. I had some plans that involved using similar infrastructure in a JIT-like manner (e.g. a REPL that would link the binary and load it), but this is probably better served using the actual JIT.

I think there are environments where invoking a process or exit(1)-ing is absolutely prohibitive. I personally can solve these issues by fork/exec-ing the driver to perfom linking, but I don’t quite understand how AMDGPU target can be useful in the new ELF linker since I had the impression that AMD basically uses ELF for the shader binary format, and compilation has to happen inside OpenGL driver (where you want to run shader compilation from multiple threads and definitely don’t want a linking issue to exit the process in case of a user-mode driver).

Arseny

Hi All,

Not sure what the future plans are for Mach-O linker (at this point it seems logical to rewrite that using the new designs but I’m not sure if it ever happens), so maybe at some point we’ll just have one linker application instead of a library and an application.

We plan to continue with the existing atom-based linker for MachO. This is an area of ongoing active development (notwithstanding me being on holidays for the last two months).

And as a consequence, I think it is unacceptable to replace the old ELF port with the new one until this is true. That is removing functionality that users of LLD realistically were depending on, which you’re seeing in this thread. That’s not cool. We don’t really tolerate dramatic regressions in functionality like this, and even if we’ve already done it, we should revert back to a state where both are available until the new port is actually ready. And ready in LLVM land means, functional as a library.

Just in case I want to clarify that right now I can still use the original ELF linker as part of lld. I was looking at migrating to get faster linking and since I got the impression old ELF linker is basically frozen, but I can still use the old one in the mean time.

Thank you for the rest; happy to see other people share my opinion on library vs application issue.

Arseny

By organizing it as a library, I’m expecting something coarse. I don’t expect to reorganize the linker itself as a collection of small libraries, but make the entire linker available as a library, so that you can link stuff in-process. More specifically, I expect that the library would basically export one function, link(std::vector), which takes command line arguments, and returns a memory buffer for a newly created executable. We may want to allow a mix of StringRef and MemoryBuffer as input, so that you can directly pass in-memory objects to the linker, but the basic idea remains the same.

Are we on the same page?

Usually you create a new process anyway when integrating within a driver to shield a crash from the compiler linker from the rest of the stack, the same way the clang driver does (usually) issue a new process for the compilation.
(this is not an argument against the library design, which I completely support).

By organizing it as a library, I’m expecting something coarse. I don’t expect to reorganize the linker itself as a collection of small libraries, but make the entire linker available as a library, so that you can link stuff in-process. More specifically, I expect that the library would basically export one function, link(std::vector), which takes command line arguments, and returns a memory buffer for a newly created executable. We may want to allow a mix of StringRef and MemoryBuffer as input, so that you can directly pass in-memory objects to the linker, but the basic idea remains the same.

Are we on the same page?

Let me answer this below, where I think you get to the core of the problem.

I’m very sympathetic to the problem of not wanting to design an API until the concrete use cases for it appear. That makes perfect sense.

We just need to be ready to extend the library API (and potentially find a more fine grained layering if one is actually called for) when a reasonable and real use case arises for some users of LLD. Once we have people that actually have a use case and want to introduce a certain interface to the library that supports it, we need to work with them to figure out how to effectively support their use case.

At the least, we clearly need the super simple interface[1] that the command line tool would use, but an in-process linker could also probably use.

We might need minor extensions to effectively support Arseny’s use case (I think an in-process linker is a very reasonable thing to support, I’d even like to teach the Clang driver to optionally work that way to be more efficient on platforms like Windows). But I have to imagine that the interface for an in-process static linker and the command line linker are extremely similar if not precisely the same.

At some point, it might also make sense to support more interesting linking scenarios such as linking a PIC “shared object” that can be mapped into the running process for JIT users. But I think it is reasonable to build the interface that those users need when those users are ready to leverage LLD. That way we can work with them to make sure we don’t build the wrong interface or an overly complicated one (as you say).

Make sense?
-Chandler

I don’t disagree with anything Chandler said, but it’s worth noting that we already have a specialized in-process linker used to MCJIT to resolve relocations and patch things like symbolic calls. It’d be really really nice if the new linker library supported that use case.

By organizing it as a library, I’m expecting something coarse. I don’t expect to reorganize the linker itself as a collection of small libraries, but make the entire linker available as a library, so that you can link stuff in-process. More specifically, I expect that the library would basically export one function, link(std::vector), which takes command line arguments, and returns a memory buffer for a newly created executable. We may want to allow a mix of StringRef and MemoryBuffer as input, so that you can directly pass in-memory objects to the linker, but the basic idea remains the same.

Are we on the same page?

Let me answer this below, where I think you get to the core of the problem.

I’m very sympathetic to the problem of not wanting to design an API until the concrete use cases for it appear. That makes perfect sense.

We just need to be ready to extend the library API (and potentially find a more fine grained layering if one is actually called for) when a reasonable and real use case arises for some users of LLD. Once we have people that actually have a use case and want to introduce a certain interface to the library that supports it, we need to work with them to figure out how to effectively support their use case.

At the least, we clearly need the super simple interface[1] that the command line tool would use, but an in-process linker could also probably use.

We might need minor extensions to effectively support Arseny’s use case (I think an in-process linker is a very reasonable thing to support, I’d even like to teach the Clang driver to optionally work that way to be more efficient on platforms like Windows). But I have to imagine that the interface for an in-process static linker and the command line linker are extremely similar if not precisely the same.

At some point, it might also make sense to support more interesting linking scenarios such as linking a PIC “shared object” that can be mapped into the running process for JIT users. But I think it is reasonable to build the interface that those users need when those users are ready to leverage LLD. That way we can work with them to make sure we don’t build the wrong interface or an overly complicated one (as you say).

I don’t disagree with anything Chandler said, but it’s worth noting that we already have a specialized in-process linker used to MCJIT to resolve relocations and patch things like symbolic calls. It’d be really really nice if the new linker library supported that use case.

This is, in fact, the goal that Dave and I mentioned :slight_smile:

Lang and I have been talking about this wrt MCJIT for a long time.

-eric

Yep. But I also think it is reasonable to design the API to support that use case when we’re actually wiring it together and making it work. I think trying to build an API that we think will support that use case without actually integrating it into LLVM’s JIT is much more likely to end up with the wrong API, and has a higher chance over-engineering the API into one more complex than is necessary.

I think that’s the concern Rui has here, which seems reasonable.

None of this says that once someone is actually working on doing this integration we shouldn’t figure out the right API and make it happen. We should. This is another use case that makes total sense.

-Chandler

I think Chandler’s right. I do hope to eventually integrate LLD into the JIT, but I expect it to be a reasonably natural fit - I don’t think we need to make any preemptive design decisions based on that possibility.

Cheers,
Lang.