Can we establish layering for the LLD libraries? Current state is a bit of a mess...

I wanted to go through and map out the layering of LLD’s libraries today and found that it’s essentially impossible. I think some serious cleanup is needed here.

Let’s start with the purely link-level dependencies encoded in the CMake build:

Curently the Core library depends on the ReaderWriter/Native library, which links against the ReaderWriter library, which links against the Core library. This clearly cannot work. The same cycle exists with Core → YAML → ReaderWrite → Core.

The situation seems a bit worse for includes. If you start from LinkingContext.h I think this becomes quite clear. This is ostensibly part of the Core library, but it has methods that produce types from the ReaderWriter library. Combined with the fact that ReaderWriter depends on Core (not the other way around) and ReaderWriter/ELF subclasses LinkingContext, I can’t make heads or tails of what was intended here.

My vague guess is that Core should actually be two libraries. One that doesn’t depend on anything (other than Config) and provides the very “core” types for LLD. And another, perhaps called “Linker” which is a much higher-level library and provides useful APIs for actually doing linking, depends on ReaderWriter and provides methods that manipulate it. I could even see needing to spilt the target libraries in a similar manner.

But I don’t know LLD’s design well, I’m just trying to stitch the build system back together in a reasonable way, so maybe I’ve missed things completely. So help me out. I’d like to understand a reasonable DAG in which to construct libraries for LLD. Having this should allow proper layering, layering checks, and eventually building with C++ modules. All of which seem essentially impossible today.

-Chandler

Hi Chandler,

Let's start with the purely link-level dependencies encoded in the CMake build

I've been playing around a bit with this. Looks like we can solve
most of the issues by dissolving lldReaderWriter.
Reader.cpp/Writer.cpp go into lldCore and the rest go into lldDriver.
I don't know if that makes conceptual sense, but it's what breaks the
cyclic dependencies.

-Greg

Does that break just the *link time* dependencies?

If you form a DAG of cross-library header inclusion and a DAG of the link
time dependencies, they should be compatible, and that's the layering we
should use.

I don't know much about LLD -- maybe it would help for you to describe the
DAG you're envisioning?

I wanted to go through and map out the layering of LLD's libraries today and found that it's essentially impossible. I think some serious cleanup is needed here.

Let's start with the purely link-level dependencies encoded in the CMake build:

Curently the Core library depends on the ReaderWriter/Native library, which links against the ReaderWriter library, which links against the Core library. This clearly cannot work. The same cycle exists with Core -> YAML -> ReaderWrite -> Core.

How are you determining these cycles? How is Core dependent on YAML? (cause that seems wrong).

I just build all of lld in an Xcode projects (which compiles each .o file and links them all together). I never see any of the layering...

The situation seems a bit worse for includes. If you start from LinkingContext.h I think this becomes quite clear. This is ostensibly part of the Core library, but it has methods that produce types from the ReaderWriter library. Combined with the fact that ReaderWriter depends on Core (not the other way around) and ReaderWriter/ELF subclasses LinkingContext, I can't make heads or tails of what was intended here.

My vague guess is that Core should actually be two libraries. One that doesn't depend on anything (other than Config) and provides the very "core" types for LLD. And another, perhaps called "Linker" which is a much higher-level library and provides useful APIs for actually doing linking, depends on ReaderWriter and provides methods that manipulate it. I could even see needing to spilt the target libraries in a similar manner.

But I don't know LLD's design well, I'm just trying to stitch the build system back together in a reasonable way, so maybe I've missed things completely. So help me out. I'd like to understand a reasonable DAG in which to construct libraries for LLD. Having this should allow proper layering, layering checks, and eventually building with C++ modules. All of which seem essentially impossible today.

One question I have is what should the granularity of the libraries be? At one point I asked about building an lld for OSX that only had mach-o support and got push-back that lld should always be a universal linker. If that is the case, why have so many small libraries?

Besides lld the command line tool, I can see lld libraries being used to construct a JIT linker. And in that case, I can see wanting a very pared down linker - just one file format (not yaml) and no driver.

> I wanted to go through and map out the layering of LLD's libraries today
and found that it's essentially impossible. I think some serious cleanup is
needed here.
>
> Let's start with the purely link-level dependencies encoded in the CMake
build:
>
> Curently the Core library depends on the ReaderWriter/Native library,
which links against the ReaderWriter library, which links against the Core
library. This clearly cannot work. The same cycle exists with Core -> YAML
-> ReaderWrite -> Core.
How are you determining these cycles? How is Core dependent on YAML?
(cause that seems wrong).

This is in the CMake build system today. You can see it with
CMakeLists.txt. Also you can look at Greg's patch for some of the issues
here.

I just build all of lld in an Xcode projects (which compiles each .o file
and links them all together). I never see any of the layering...

The CMake build works on Mac OS X too. =] With Ninja, it is even quite
speedy (or so I understand).

I'm hopeful in the not-too-distant-future we'll actually start enforcing
even header file layering using Clang's Module's support (when it happens
to be the host compiler). When that happens this stuff should actually be
really explicit and enforced easily.

Until then, I also have access to some crazy (and sadly internal) tools
that check header file layering between libraries. Hopefully we just
modules soon though.

> The situation seems a bit worse for includes. If you start from
LinkingContext.h I think this becomes quite clear. This is ostensibly part
of the Core library, but it has methods that produce types from the
ReaderWriter library. Combined with the fact that ReaderWriter depends on
Core (not the other way around) and ReaderWriter/ELF subclasses
LinkingContext, I can't make heads or tails of what was intended here.
>
>
> My vague guess is that Core should actually be two libraries. One that
doesn't depend on anything (other than Config) and provides the very "core"
types for LLD. And another, perhaps called "Linker" which is a much
higher-level library and provides useful APIs for actually doing linking,
depends on ReaderWriter and provides methods that manipulate it. I could
even see needing to spilt the target libraries in a similar manner.
>
>
> But I don't know LLD's design well, I'm just trying to stitch the build
system back together in a reasonable way, so maybe I've missed things
completely. So help me out. I'd like to understand a reasonable DAG in
which to construct libraries for LLD. Having this should allow proper
layering, layering checks, and eventually building with C++ modules. All of
which seem essentially impossible today.

One question I have is what should the granularity of the libraries be?
At one point I asked about building an lld for OSX that only had mach-o
support and got push-back that lld should always be a universal linker. If
that is the case, why have so many small libraries?

Besides lld the command line tool, I can see lld libraries being used to
construct a JIT linker. And in that case, I can see wanting a very pared
down linker - just one file format (not yaml) and no driver.

So, I think there is a good rationale for supporting both fine grained
libraries and LLD always being a universal / cross linker.

The libraries make it *possible* (as you indicate with the JIT stuff) to
build a narrowly targeted linker that has no excess functionality. We
should still choose to produce a full cross-linker binary called "lld"
which uses all the libraries.

Does that make sense?

While I think that generally layering also helps abstractly organize the
code, I don't think it is worth thinking about layering that serves no
purpose. So, if it isn't *possible* to use library A without using library
B as well, it doesn't make sense to separate them. I'm happy to separate
them if it is possible even if the users for this separation haven't yet
materialized.

-Chandler

> I wanted to go through and map out the layering of LLD's libraries
today and found that it's essentially impossible. I think some serious
cleanup is needed here.
>
> Let's start with the purely link-level dependencies encoded in the
CMake build:
>
> Curently the Core library depends on the ReaderWriter/Native library,
which links against the ReaderWriter library, which links against the Core
library. This clearly cannot work. The same cycle exists with Core -> YAML
-> ReaderWrite -> Core.
How are you determining these cycles? How is Core dependent on YAML?
(cause that seems wrong).

This is in the CMake build system today. You can see it with
CMakeLists.txt. Also you can look at Greg's patch for some of the issues
here.

It seems the dependency to lldYAML can be removed from CMakeList.txt now.
Core doesn't actually depends on YAML.

lldPasses depends on libNative, and that dependency is currently resolved
through lldCore transitively. We should be able to remove the dependency to
libNative from Core by fixing Pass's CMake file.

If my understanding is correct, the dependencies dependencies between
components in LLD should be like this:

Config: (nothing)
Core: Config
Driver: Core Passes ReaderWriter
Passes: Core ReaderWriter
ReaderWriter: Core (and Driver?)

I don't want ReaderWriter to depend on Driver, but it may be unavoidable
because of .drectve section in PE/COFF which contains command line options.

ReaderWriter should be able to be split into smaller libraries because
there's no cross-arch dependency between sub-directories in ReaderWriter.

> I wanted to go through and map out the layering of LLD's libraries
today and found that it's essentially impossible. I think some serious
cleanup is needed here.
>
> Let's start with the purely link-level dependencies encoded in the
CMake build:
>
> Curently the Core library depends on the ReaderWriter/Native library,
which links against the ReaderWriter library, which links against the Core
library. This clearly cannot work. The same cycle exists with Core -> YAML
-> ReaderWrite -> Core.
How are you determining these cycles? How is Core dependent on YAML?
(cause that seems wrong).

This is in the CMake build system today. You can see it with
CMakeLists.txt. Also you can look at Greg's patch for some of the issues
here.

It seems the dependency to lldYAML can be removed from CMakeList.txt now.
Core doesn't actually depends on YAML.

lldPasses depends on libNative, and that dependency is currently resolved
through lldCore transitively. We should be able to remove the dependency to
libNative from Core by fixing Pass's CMake file.

Cool, this makes lots of sense to me as well. We can probably fix these
minor issues quickly.

Still, it would be good to get a firm big picture that everyone agrees to
in place.

If my understanding is correct, the dependencies dependencies between
components in LLD should be like this:

Config: (nothing)
Core: Config
Driver: Core Passes ReaderWriter
Passes: Core ReaderWriter
ReaderWriter: Core (and Driver?)

I don't want ReaderWriter to depend on Driver, but it may be unavoidable
because of .drectve section in PE/COFF which contains command line options.

Well, it *can't* depend on Driver if Driver depends on ReaderWriter.

ReaderWriter should be able to be split into smaller libraries because
there's no cross-arch dependency between sub-directories in ReaderWriter.

That would be nice. Can you sketch out what this would look like?

The other issue is that while the above makes sense to me, if you look at
the #include lines in these libraries, it is *very* far from the truth. How
do we fix this? In particular, the LinkingContext seems to establish a lot
of circular dependencies.

> I wanted to go through and map out the layering of LLD's libraries
today and found that it's essentially impossible. I think some serious
cleanup is needed here.
>
> Let's start with the purely link-level dependencies encoded in the
CMake build:
>
> Curently the Core library depends on the ReaderWriter/Native library,
which links against the ReaderWriter library, which links against the Core
library. This clearly cannot work. The same cycle exists with Core -> YAML
-> ReaderWrite -> Core.
How are you determining these cycles? How is Core dependent on YAML?
(cause that seems wrong).

This is in the CMake build system today. You can see it with
CMakeLists.txt. Also you can look at Greg's patch for some of the issues
here.

It seems the dependency to lldYAML can be removed from CMakeList.txt now.
Core doesn't actually depends on YAML.

lldPasses depends on libNative, and that dependency is currently resolved
through lldCore transitively. We should be able to remove the dependency to
libNative from Core by fixing Pass's CMake file.

Cool, this makes lots of sense to me as well. We can probably fix these
minor issues quickly.

Still, it would be good to get a firm big picture that everyone agrees to
in place.

If my understanding is correct, the dependencies dependencies between
components in LLD should be like this:

Config: (nothing)
Core: Config
Driver: Core Passes ReaderWriter
Passes: Core ReaderWriter
ReaderWriter: Core (and Driver?)

I don't want ReaderWriter to depend on Driver, but it may be unavoidable
because of .drectve section in PE/COFF which contains command line options.

Well, it *can't* depend on Driver if Driver depends on ReaderWriter.

Oh that's true. Then the only thing we can do is to combine Driver and
ReaderWriter to make it one library? It's far from ideal and feels wrong
though.

ReaderWriter should be able to be split into smaller libraries because
there's no cross-arch dependency between sub-directories in ReaderWriter.

That would be nice. Can you sketch out what this would look like?

Basically each sub-directory in lib/ReaderWriter should be able to become a
separate library.

The other issue is that while the above makes sense to me, if you look at
the #include lines in these libraries, it is *very* far from the truth. How
do we fix this? In particular, the LinkingContext seems to establish a lot
of circular dependencies.

Well that's not easy to answer without taking a closer look at the code,
but I agree that the currently #include dependencies are nasty.

One idea would be to move LinkingContext.h to Core while keeping all
arch-specific LinkingContexts in ReaderWriter. I believe Core and Passes
depends only on (generic) LinkingContext so we can remove dependency from
Core/Passes to ReaderWriter by doing that.

Can we finesse this by having the WinLinkDriver register a “parse” function pointer with PECOFFLinkingContext. Then have FileCOFF just call the registered parse function in the PECOFFLinkingContext?

-Nick

We could and I thought about that. The problem is that it's quite common to
have a .drectve section for a COFF file, so PE/COFF ReaderWriter would be
mostly unusable if you don't register the parser. If you always have to
register, it would spoil the meaning of splitting them up.

Does that break just the *link time* dependencies?

Yes

maybe it would help for you to describe the DAG you're envisioning?

Something like: http://yuml.me/edit/324beed8

* lldReaderWriter is gone. Reader.cpp and Writter.cpp moved down to
lldCore and the others went up to lldDriver.
* ELF targets depend on lldELF and not the other way around.
* All the ELF targets are referenced directly by the Driver.
Alternatively, we could add an lldELFTargets library to
ReaderWriter/ELF (overkill?).

Nick wrote:

Can we finesse this by having the WinLinkDriver register a “parse” function pointer with
PECOFFLinkingContext. Then have FileCOFF just call the registered parse function in
the PECOFFLinkingContext?

Sounds good to me.

-Greg

Try building LLVM + LLD with shared libraries on Linux. That will
reliably fail due to dependency cycles between the components.

Joerg

Even better, on OS X :slight_smile:

ELF (without -z,defs) is less strict. Last I tried
-DBUILD_SHARED_LIBS=ON was failing on OS X but working on linux.

Cheers,
Rafael