[lld] Modeling ELF FileNodes/ControlNodes (Group's) in lld

Hi,

With the inputGraph now, lld models command line options, input files as nodes in the InputGraph called InputElements.

In the current approach, each InputElement is converted to a LinkerInput, which works if all lld deals with individual files.

Dealing with ControlNodes (Groups), have a problem with it, on how to model that as a LinkerInput.

Joerg/Me were chatting on the IRC about this and we came up with the following approach.

- LinkerInput will contain a single file(lld::File), if the node that its pointing to appears to be a FileNode
- LinkerInput will contain a vector(lld::Group) of files(lld::Files) , if the node that its pointing appears to be a Group

The resolver would need to be modified to consider lld::Groups in addition to lld::File.

Does this sound like the approach we want to take ?

Thanks

Shankar Easwaran

The first question is that Group is to represent --start-group/–end-group?

If I understand your proposal correctly, here’s the thing: if file is not in group, each individual file is wrapped with LinkerInput, but if it’s in a group, it’s not – instead the entire group is wrapped with a LinkerInput. This asymmetry is a bit concerning. If we don’t need a LinkerInput for each individual input file, we could get rid of it from the former case. Otherwise, I’d think we need LinkerInput in the latter case.

For example, if the following command line options are given, how it’s represented with LinkerInput, Group and File?

–start-group foo.a --as-needed bar.a --no-as-needed --end-group

Yes, the Group is to represent --start-group,--end-group.

So the group here will be contained in the linker Input as a vector of lld::files (foo.a, bar.a).

thanks

Shankar Easwaran

I do think we have too many classes. I thought InputGraph was going to replace InputFiles. It seems link LinkerInput could be merged into FileNode.

Originally InputFiles was the abstract interface that he Resolver used to see all the inputs. If InputGraph supported the methods forEachInitalAtom() and searchLibraries() then we could get rid of InputFiles and have the Resolver uses InputGraph directly.

What should the interface be that the Resolver uses for handling groups?

-Nick

Yes, the Group is to represent --start-group,--end-group.

So the group here will be contained in the linker Input as a vector of
lld::files (foo.a, bar.a).

It seems you dropped --as-needed attribute... ?

By lld::files, what class are you pointing to, lld::File or lld::InputFiles?

thanks

The --as-needed attribute is preserved and is contained within the ELF FileNode.

By lld::files, I am referring to lld::File.

Thanks

Shankar Easwaran

The --as-needed attribute is preserved and is contained within the ELF
FileNode.

I'm not really get your point. If each input file don't need LinkerInput,
why it needs LinkerInput only when an input file is not within
--start-group/end-group? I really agree with Nick's point that we have too
many classes.

By lld::files, I am referring to lld::File.

Hi Nick,

I do think we have too many classes.

Agree.

  I thought InputGraph was going to replace InputFiles.

Interesting idea.

  It seems link LinkerInput could be merged into FileNode.

Agree.

Originally InputFiles was the abstract interface that he Resolver used to see all the inputs. If InputGraph supported the methods forEachInitalAtom() and searchLibraries() then we could get rid of InputFiles and have the Resolver uses InputGraph directly.

Yes, this would be nice. I will try to write a proposal in the next few days.

What should the interface be that the Resolver uses for handling groups?

bool resolveUndefines(std::vector<File> &files) ?

This has to iterate over the files until it reaches a stable point (that no more resolution is possible).

Thanks

Shankar Easwaran

I both agree and disagree. Logically we have two different views, the
command line and the resulting input tree on the side and the groups of
object files as seen by the resolver on the other side. The goal of
parseFile is ultimately to transform the first into the second. Agreed
so far?

Want I want to do is encapsulate the "command line" side in LinkerInput,
that means the "logical" path used for error reporting and the buffers
associated with the input. The resolver side should not be involved with
either. I see D1598 (changing parseFile to take LinkerInput) as
necessary step toward D1587 and related changes, i.e. the ability to
properly represent positional flags and hand that information down.

This now leaves out the question whether the command line view and the
resolver view should have the same classes or not. I am not at the point
yet where I can tell what the best behavior is. I can see good reasons
for a strict separation in the class tree, but it may also be more
artifical as separation than necessary.

Joerg

Hi Joerg,

I like the approach that Nick mentioned, that, if there is a way to structure the resolver around InputGraph, it makes things much easier.

You can have different flavors of linking so easily(because that is where we finally want, with lld supporting multiple flavors of linking).

We could have a ResolverPolicy which the InputGraph could take in and resolve files in any style.

Thanks

Shankar Easwaran

I do think we have too many classes.

Agree.

I thought InputGraph was going to replace InputFiles.

Interesting idea.

It seems link LinkerInput could be merged into FileNode.

Agree.

Originally InputFiles was the abstract interface that he Resolver used to see all the inputs. If InputGraph supported the methods forEachInitalAtom() and searchLibraries() then we could get rid of InputFiles and have the Resolver uses InputGraph directly.

Yes, this would be nice. I will try to write a proposal in the next few days.

What should the interface be that the Resolver uses for handling groups?

bool resolveUndefines(std::vector<File> &files) ?

This has to iterate over the files until it reaches a stable point (that no more resolution is possible).

It is a little more complicated. After initial .o files are processed, there are a bunch of undefines left. The resolver needs to start loading archive members that fulfill those undefines, but loading a member can introduce even more undefines, so it has to iterate.

I think the existing searchLibraries(StringRef name, bool options..) interface almost works for InputGraph. To support groups we just need to keep track of where in the InputGraph we currently are, so that searchLibraries() can spin in a group until no more undefs are fulfilled, then move on in the graph.

-Nick

I do think we have too many classes. I thought InputGraph was going to
replace InputFiles. It seems link LinkerInput could be merged into
FileNode.

Originally InputFiles was the abstract interface that he Resolver used to
see all the inputs. If InputGraph supported the methods
forEachInitalAtom() and searchLibraries() then we could get rid of
InputFiles and have the Resolver uses InputGraph directly.

This is how I imaginged it would work.

- Michael Spencer

>> I do think we have too many classes.
> Agree.
>> I thought InputGraph was going to replace InputFiles.
> Interesting idea.
>> It seems link LinkerInput could be merged into FileNode.
> Agree.
>>
>> Originally InputFiles was the abstract interface that he Resolver used
to see all the inputs. If InputGraph supported the methods
forEachInitalAtom() and searchLibraries() then we could get rid of
InputFiles and have the Resolver uses InputGraph directly.
> Yes, this would be nice. I will try to write a proposal in the next few
days.
>> What should the interface be that the Resolver uses for handling groups?
> bool resolveUndefines(std::vector<File> &files) ?
>
> This has to iterate over the files until it reaches a stable point (that
no more resolution is possible).
It is a little more complicated. After initial .o files are processed,
there are a bunch of undefines left. The resolver needs to start loading
archive members that fulfill those undefines, but loading a member can
introduce even more undefines, so it has to iterate.

I think the existing searchLibraries(StringRef name, bool options..)
interface almost works for InputGraph. To support groups we just need to
keep track of where in the InputGraph we currently are, so that
searchLibraries() can spin in a group until no more undefs are fulfilled,
then move on in the graph.

-Nick

The easiest way to handle this is to have the resolver do it directly.
Darwin ld's behavior of looping over everything can be implemented by
putting everything in an implicit group. We could make it more extensible
by factoring out the searching, but I don't really see a need for that
complexity.

- Michael Spencer

I like the idea too of making InputGraph drive how resolution would work.

Thanks

Shankar Easwaran

Hi Nick,

These are the below modifications needed in lld to start processing groups :-

1) LinkerInput would be moved to FileNode that contains the following functions
     - getBuffer
      - takeBuffer
      - getPath

2) The driver will process the vector of InputElements and call /*process */on each of them.
      process() would create a lld::File object within the InputElement if its a FileNode
      process() would create a vector of lld::File objects within the InputElement if its a ControlNode

3) The resolver will not process each file but it would process InputElements and would do the following :-
     3.1 ) if the InputElement corresponds to a fileNode, the resolver will call in
             inputElement->processAtoms(*this) and marks the InputElement as already processed.
*This would essentially call resolver.processFile(_currentFile)*

      3.2) If the inputElement corresponds to a GroupNode, the resolver will call inputElement->processAtom(*this) on
*This* *would essentiall call resolver.processGroup(std::vector<Files>), the vector of files is what makes it in the current group*

4) InputFiles would be removed
5) LinkerInput would be removed
6) Functions that are unused forEachInitialAtom etc

_*Issues*_*

*This will essentially break the Darwin model thats in lld, as the current functionalities processes all the object files and only then processes
archive files.

Can we control by using a resolution policy ? which is a set of boolean flags to control the resolution behavior ? Do you have any other approach ?

_*Questions*_

1) There might be other types of control nodes, what would we need to do, can the resolver be controlled by each flavor ?

Thanks

Shankar Easwaran

The way darwin works with the current scheme is that the files are added to InputFiles in command line order, then forEachInitialAtom() walks the whole list but only operates on the non-library (i.e. object files) and searchLibraries() only operates on library files.

If we have the Resolver walk the graph, then either:

  1. We need some option for darwin and gnu to work differently, or
  2. Have the darwin driver construct the graph with all libraries in one group at the end. As long as that is straight forward to do, it seems simpler and also captures the darwin difference that all libraries are always repeatedly searched.

-Nick

One way, I can think of this being done is to get all the libraries that are specified in the command line and create a group node, and add it after the command line is processed to the inputGraph.

PS : Does Darwin have a command line option to handle files in the order specified in the command line for libraries ? If so, then the above would break

I am planning to support other operations too like

inputElement->setPosition(InputGraph::Top)
inputElement->setPosition(InputGraph::Last)
inputElement->setPosition(InputGraph::Position, <value>)

The user has to explicitly call a seperate api so that the elements are ordered according to whatever the user wants.

inputGraph->insertInputElementAt(InputGraph::Top, std::vector<std::unique_ptr<InputElement>>&)
inputGraph->insertInputElementAt(InputGraph::Last, std::vector<std::unique_ptr<InputElement>>&)
inputGraph->insertInputElementAt(InputGraph::Position, element, std::vector<std::unique_ptr<InputElement>>&)

The user has to explicitly call a seperate api so that the elements for ordinals to be set appropriately.

Thanks

Shankar Easwaran

The way darwin works with the current scheme is that the files are added to InputFiles in command line order, then forEachInitialAtom() walks the whole list but only operates on the non-library (i.e. object files) and searchLibraries() only operates on library files.

If we have the Resolver walk the graph, then either:
1) We need some option for darwin and gnu to work differently, or
2) Have the darwin driver construct the graph with all libraries in one group at the end. As long as that is straight forward to do, it seems simpler and also captures the darwin difference that all libraries are always repeatedly searched.

-Nick

One way, I can think of this being done is to get all the libraries that are specified in the command line and create a group node, and add it after the command line is processed to the inputGraph.

That sounds simple enough.

PS : Does Darwin have a command line option to handle files in the order specified in the command line for libraries ? If so, then the above would break

No.

-Nick