LLD archive library design

Hi,

I have started to work on support for Reading archive libraries in lld and thought of using the llvm/lib/ArchiveReader for this.

The ArchiveReader doesnot fully support GNU archive libraries (thin archives), do you think we should continue using llvm/lib/ArchiveReader ?

I was chatting with Michael and looks like there have been discussions and small sketches done on reading archive libraries in lld. Can you provide pointers to the design ?

I think the first set of features that lld needs to support is

1) Read the archive libraries and pull in the right object file which defines the symbol
2) Read the archive libraries and pull all the object files (if force load)

There are os/architecture specific portions of resolving symbols with the archive libraries when it comes to common symbols, do you have any suggestions on this ?

Thanks

Shankar Easwaran
Qualcomm Innovation Center Inc.

Hi,

I have started to work on support for Reading archive libraries in lld and thought of using the llvm/lib/ArchiveReader for this.

The ArchiveReader doesnot fully support GNU archive libraries (thin archives), do you think we should continue using llvm/lib/ArchiveReader ?

I was chatting with Michael and looks like there have been discussions and small sketches done on reading archive libraries in lld. Can you provide pointers to the design ?

The general idea is to have a new Reader class (i.e. ReaderArchive) along with a ReaderOptionsArchive class. One of the options will be if all members are force loaded or not.

The ReaderArchive ::parseFile() method will check the force-load option, if true it will parse up all the members and return an vector<> of File* objects, one for each member. If force-load is not specified, ReaderArchive ::parseFile() will just return one FileArchive object which is a subclass of ArchiveLibraryFile. The find() method of that class will search the table of contents and return a File* object for the member defining the requested symbol, or nullptr if nothing in the archive defines that symbol.

I think the first set of features that lld needs to support is

1) Read the archive libraries and pull in the right object file which defines the symbol
2) Read the archive libraries and pull all the object files (if force load)

There are os/architecture specific portions of resolving symbols with the archive libraries when it comes to common symbols, do you have any suggestions on this ?

There may need to be some ReaderOptionsArchive flags for how commons should be handled. There should also be some ResolverOptions for how commons are handled.

I don't know how much the archive file format varies across platforms. In particular the table-of-contents file may be named differently and have a different format. Again we may need ReaderOptionsArchive flags to drive this, or maybe the ReaderOptionsArchive can implicitly figure it out by looking at the content??

Another issue is how the ReaderArchive knows which Reader to call to instantiate each member. We have the same problem at the top level of the linker. You could punt this to the client for now by having the ReaderOptionsArchive contain a Reader* which ReaderArchive and FileArchive use to instantiate member files.

-Nick

Hi Nick,

Here is my understanding,

1) lld-core creates a ReaderOptionsArchive class with the (Reader, CommandLine options flag)
2) lld-core creates an object of type ReaderArchive(ReaderOptions), that users would subclass (off ArchiveLibraryFile)
     a) GNUArchiveLibrary
     b) BSDArchiveLibrary
     b) MachOArchiveLibrary
     c) COFFArchiveLibrary
3) ReaderArchive has two functions
      a) parseFile that returns a vector of file objects, if the force load option is set
      b) parseFile (overloaded) that returns the ArchiveLibraryFile
4) The ArchiveLibrary object has functions:
      find(symbolName, isDataSym) and returns a file object if the file contains a definition for that symbolName.

Now to how the model fits in the current design.

1) Files are read one at a time, and a list of atoms are produced from it
2) if the File is an archive then, the linker has to invoke the appropriate Reader that has been created
     a) Hooks are needed to determine if the input file is an archive library, Should there be a function in ReaderArchive to check if the InputFile is an archive library ?
     b) InputFiles is a vector of Files, FileArchive doesnot derive from file, is a seperate vector needed ?
     b) lld also has to invoke variation of parseFile function so that it returns a FileArchive instead of a vector of files, how does lld need to invoke ?
     c) If it returns a vector of files as specified in the ReaderArchiveOptions, all of them need to be added to the _inputFiles object

Thanks

Shankar Easwaran
Qualcomm Innovation Center.

Michael and I have had discussions on how to drive this. In addition to archives, you might have yaml or native .o files that you want to support linking in. So I’m sure this issue will be refactored a few times until we are happy. At this point, I would just have a static method in ReaderArchive the says whether the file is an archive or not. Then have ReaderELF call that method and if it returns true, call through to a ReaderArchive (passing the ReaderELF object as an option in the ReaderOptionsArchive). Also, there is an identify_magic() function in llvm/Support that may help.

FileArchive should derive from ArchiveLibraryFile which derives from File.

There is only one parseFile() is always returns a vector of File* objects. In the non-force-load case it would return a vector of one FileArchive*.

That already happens in appendFiles().

-Nick