[lld] [mach-o]: RFC: representing LC_REEXPORT_DYLIB

Hi all,

I've been thinking about how best to represent MachO's
LC_REEXPORT_DYLIB (used even by libSystem.dylib to provide its various
sub-components[*]).

It looks like this functionality would naturally fall into the
InputGraph, in analogy with Groups and Archives. Unfortunately, it's
rather more dynamic than the existing cases: we don't know the needed
files before parsing the top-level one, and need to open multiple
files. Essentially, we'd need to create new MachOFileNodes based on
the contents of the parent.

It seems there are two obvious ways to do this:

1. Create them while we still have the MachONormalizedFile around; I
think this would mean extending the InputGraph::parse interface to
allow new InputNodes to be passed back.
2. Add an atom type to represent the dependency and create the actual
nodes when we get back to MachOFileNode::parse.

I'm still very new to lld, so which of these fits in better with our
goals? Or has someone else already thought about it and have a cunning
plan? I'm happy to implement anyone's idea if it's the neatest way to
go.

Cheers.

Tim.

[*] It's the last barrier to "lld -flavor darwin -arch x86_64
-macosx_version_min 10.9 hello_world.o /usr/lib/libSystem.dylib
-ohello_world" working, I think! Using
/usr/lib/system/libsystem_c.dylib already does.

Hi Tim,

Are you refererring to the dependencies of the dynamic library that needs to be traversed, to resolve shared library atoms ?

You could build a Dynamic library node and have the symbol returned when the shared library is called for a symbol, that needs to be resolved using the below API.

const SharedLibraryAtom *exports(StringRef name, bool dataSymbolOnly)

Will this work ?

I think symbol resolution is more involved in MachO(from previous conversations) with the current lld model, that symbols are resolved from archive and dynamic libraries after all the object files are processed too.

Will this simplify things ?

Thanks

Shankar Easwaran

Hi Shankar,

You could build a Dynamic library node and have the symbol returned when the
shared library is called for a symbol, that needs to be resolved using the
below API.

const SharedLibraryAtom *exports(StringRef name, bool dataSymbolOnly)

Will this work ?

I did actually consider something along those lines, but it seemed
like even more of a hack so I didn't mention it in my message.

It could be made to work, but would involve reading new files in
either MachONormalizedFileToAtoms or the exports function itself. Both
of those seem like they're at the wrong level: we'd need to largely
re-implement the FileNode I/O handling and graph descent that already
exists.

I'm also not convinced the lifetime and ownership issues work out well
in that scheme. lld as a whole seems to keep the MemoryBuffers
associated with files around, which makes that location even less
pleasant from a layering point of view.

Cheers.

Tim.

Hi all,

I've been thinking about how best to represent MachO's
LC_REEXPORT_DYLIB (used even by libSystem.dylib to provide its various
sub-components[*]).

It looks like this functionality would naturally fall into the
InputGraph, in analogy with Groups and Archives. Unfortunately, it's
rather more dynamic than the existing cases: we don't know the needed
files before parsing the top-level one, and need to open multiple
files. Essentially, we'd need to create new MachOFileNodes based on
the contents of the parent.

It seems there are two obvious ways to do this:

1. Create them while we still have the MachONormalizedFile around; I
think this would mean extending the InputGraph::parse interface to
allow new InputNodes to be passed back.

ld64 does it in two phases. The first phase just loads the dylibs directly
specified on the command line. The second phase loads any
“indirect” dylibs.

Perhaps at the end of DarwinLdDriver::parse() after the nodes are created
for all the command line files, the driver can iterate over the nodes and
instantiate any indirect dylibs needed? You don’t want to load the
indirect dylibs as each direct dylib is loaded because one of the indirect
ones may later turn out to be a direct one, and the order determines
the two-level-namespace ordinal used which we want to remain deterministic.

In ld64 the processing of indirect dylibs has two purposes: 1) to support
LC_REEXPORT_DYLIB, 2) to support flat_namespace linking of a
main executable wherein the linker must check all undefines in all
dylibs are resolved.

The second case is also needed for ELF linkers (—no-allow-shilb-undefined)
which means an ELF linker would need to load indirect dylibs too.

Shankar, Does lld for ELF support loading indirect DSOs?

-Nick

The Gnu flavor doesnot try to read dependent(indirect DSO's) libraries for resolving symbols unless the dependent library is also added in the link line.

Test case :-

cat > 1.c << \!
int main() {
   fn();
   return 0;
}
!

cat > fn.c << \!
int fn() { return fn1(); }
!

cat > fn1.c << \!
int fn1() { return fn2(); }
!

gcc -c fn.c fn1.c -fPIC 1.c
ld -shared fn1.o -o libfn1.so
ld -shared fn.o -L. -lfn1 -o libfn.so
ld 1.o -L. -lfn -t --no-allow-shlib-undefined

=> Does not read libfn1 et all, not used to this, I dont know why this has been followed on Gnu and the reasoning behind it. I'm not sure if this is by design or a bug, that was never fixed.

In the case of lld, for the Gnu flavor, we dont need to support indirect DSO's :slight_smile:

Thanks

Shankar Easwaran

It doesn't, which is a bug.

Joerg

Hi joerg,

Gnu linker does not support for some reason as well, if we are emulating the gnu behavior it might be good to handle it in the same way.

No ??

Thanks

Shankar Easwaran

Binutils broke the ELF behavior with some recent versions. That is not
something to follow.

Joerg

Do they agree that it's broken and intend to fix it, or is this a
matter in dispute between the various groups?

Cheers.

Tim.

I don't think they have any intention of fixing it as it was a very
deliberate chance.

Joerg

Perhaps at the end of DarwinLdDriver::parse() after the nodes are created
for all the command line files, the driver can iterate over the nodes and
instantiate any indirect dylibs needed?

That seems to be taking over from Driver::link, which very carefully
dispatches object parsing to a bunch of tasks to do it in parallel (we
obviously don't know what's LC_REEXPORT_DYLIBed until we have parsed
the input file). It would seem better if we could find a way the left
it doing that job.

You don’t want to load the
indirect dylibs as each direct dylib is loaded because one of the indirect
ones may later turn out to be a direct one, and the order determines
the two-level-namespace ordinal used which we want to remain deterministic.

Ah, I'd not considered anything like that. Presumably this issue goes
beyond the directly specified libraries too (i.e. it matters which
directly specified library a symbol gets associated with, even if it
doesn't come from one).

That probably rules out a simple depth-first InputGraph, but makes
constructing a correct one a bit tricky. I'll have to play around, I
think.

Shankar, Does lld for ELF support loading indirect DSOs?

I've looked into the ELF situation a bit more (thanks Joerg, for
supplying the initial hints!). It seems to date back to the thread in
[1] (with some background at [2]), where they changed the default from
automatically copying DT_NEEDED entries to requiring a command-line
override for it (personally, I think that was probably the right
decision).

But either way it means that ELF will probably need this ability
eventually, exposed via a "--copy-dt-needed-entries" option if nothing
else.

Cheers.

Tim.

[1] https://sourceware.org/ml/binutils/2011-08/msg00129.html
[2] http://fedoraproject.org/wiki/UnderstandingDSOLinkChange

That probably rules out a simple depth-first InputGraph, but makes
constructing a correct one a bit tricky. I'll have to play around, I
think.

Saleem's pointed out that COFF may actually already have this
implemented. I'll see if I can track down where and how.

Cheers.

Tim.

Hi Nick,

You don’t want to load the
indirect dylibs as each direct dylib is loaded because one of the indirect
ones may later turn out to be a direct one, and the order determines
the two-level-namespace ordinal used which we want to remain deterministic.

I've finally got back to this issue and I'm not sure what you mean
here. My tests suggest that ld64 performs a depth-first search of the
libraries and we *do* want to load them at the same time (or at least
make sure they're considered at the same time for resolution
purposes). For example (reproduced by tmp.sh attached):

$ cat foo.c
int foo() {
  return 'f';
}
$ cat main.c
extern int foo();
int main() {
  return foo();
}
$ cat wrapper.c
$ clang -shared foo.c -olibfoo.dylib
$ clang wrapper.c -shared -Wl,-reexport_library,libfoo.dylib -o libwrapper.dylib
$ clang main.c libwrapper.dylib libfoo.dylib -omain
$ nm -nm main
                 (undefined) external _foo (from libwrapper)

It looks like it would correspond reasonably well with sticking
everything within the one directly specified InputElement somehow.
Avoiding cycles seems like it might be the trickiest part.

Cheers.

Tim.

tmp.sh (327 Bytes)

Tim, for that simple case, it does not matter if you do a depth-first or
breadth-first load of the dylibs. But things get more complicated.
I hope none of this makes your eyes bleed...

One issue is that before 10.6/iOS3.1 we did not have LC_REEXPORT_DYLIB.
Instead of the parent saying it re-exports a child, a child may have a load
command which said the name of the parent which re-exported it.
Maybe it is been long enough that we can drop support for this in lld. But
to implement support, you have to open every child dylib and look to see
if it says the parent re-exports it! To tell if a dylib uses the new or old style
or re-export, the mach_header flag MH_NO_REEXPORTED_DYLIBS bit
is set on new style dylibs (with no LC_REEXPORT_DYLIB commands).

Another feature is that re-exports are convenient for build time (less dylibs
to specify) but slow down runtime because dyld has to search multiple dylibs
for a symbol. In your example, the two-level ordinal in main says that _foo
is in libwrapper, but dyld looks there and does not find it, but then notices
that libwrapper re-exports libfoo.dylib, so dyld then searches libfoo.dylib.
To improve performance, the linker has an optimization which can “hoist”
“public” dylibs up. An example is a Cocoa app that just links with Cocoa.framework
and calls _objc_msgSend. Well, Cocoa re-exports AppKit which re-exports
Foundation which re-exports libobjc.dylib which actually implements
_objc_msgSend. Rather than recording that _objc_msgSend is in Cocoa in
the app binary (which would cause dyld to do a lot of searching), the linker
sees that libobjc.dylib is in /usr/lib/ which means it is a public framework.
Therefore, the developer could have added -lobjc and the linker would
have recorded _objc_msgSend came from that. So the linker pretends
the user added -lobjc and then records _objc_msgSend as coming from it.

And a more recent feature (tied into clang modules) is “auto-linking”. The
compiler can now emit LC_LINKER_OPTION load commands into .o files.
These tell the linker about frameworks and libraries that *might* be needed
during the linker and are only really added to the linker if doing so would
resolve some undefined symbol.

Lastly, the way two-level-namespace ordinals are based on the index of each
LC_LOAD_DYLIB (and friends) in the binary. We must have a load command
for each library on the command line (you can force a dependency on a dylib
even if nothing is used from it). There may be additional load dylib commands
based on auto-linking or hoisting. But overall we want links to be stable and
deterministic. There should not be race conditions that result it different
possible binaries from the same link.

The net of all these features (in my mind) is that the linker needs to maintain
a couple of “pools” of dylibs:
1) dylibs with an assigned ordinal (initially the dylibs directly on command line)
2) indirect dylibs
3) auto-link dylibs in waiting
Via various rules, dylibs in 2 or 3 may get moved up to 1. We need stable rules
so that the ordinals always reproduce.

-Nick