Advanced Command line processing with lld

  Hi Nick, Michael,

I was trying to vision an advanced command line processing framework for
lld, which would essentially do the following

a) Creating nodes from command line options
b) Validate the command line options
c) Rearrange command line options by running passes!

*Creating nodes from command line options*

For the linker to support various flavors, we wouldn't want to treat
command line options as just options as for example on ELF,
we have

a) positional command line options
b) group command line options
c) non position based

To accommodate all the options, the command line options that the linker
gets could be treated as a graph where the nodes represent

* input files
* grouped commands
* positional commands

all other inputs which don't fall in the category are also represented by
nodes and represented as a vector.

*Advantages

*a) don't need advanced logic to detect a file from being loaded twice
b) much cleaner code, because of less singleton patterns used
c) easy to debug and represent them
**
*Validate the command line options
*It would be nice to have Traits registered for each and every type of
node to check for valid options in them. The actual command line parsing
library will not be responsible for validating the options.

*Rearrange command line options

*Certain users keep playing with the linker command line options until
they figure out a set of options that work for them,

Essentially I have seen people playing around with

a) --start-group, --end-group (traverses the whole list of files in the
group many times until no new symbol is added)
b) --force-load-archive, --no-force-load-archive (force load all the
symbols from the archive library)

*The disadvantage of using (a)/(b) are making the linker slower.

*It would be useful to have a Pass Manager to change the order of files
that are seen in the command line for the below reasons :-

a) Input file positioning, which positions the file depending on the
number of calls from one file to another file.
b) Improve locality of reference by positioning files closer to each other.

*Other advantages*
Dead input file stripping. You can remove input files that are not
referenced in a static link.

*Representation

*I thought of representing

a) input files
b) group commands
c) positional commands

by creating atoms as LLD has a framework to represent atoms that can be
tested and played around with.

The whole command line could be represented as

a) a set of command line atoms with their assigned ordinals
b) you have a *follow on* edge from one input file to the other
c) you have a in-group reference for all input files that are in a group

For example, if you have a command line as below :-

LLD -flavor gnu a.o b.o c.o --start-group libc.a libpthread.a --as-needed
libc.so --no-as-needed --end-group d.o mylib.a libc.a

This could be represented by atoms

a.o ------------------> b.o -------------------> c.o ---> (GROUP) ----->
d.o ---> mylib.a ---> libc.a
followed by (fb) fb fb |
           fb fb fb

    libc.a ---------------> libpthread.a --> as-needed -------> libc.so

          fb ingroup (ingroup) fb
(ingroup)
                                                                                                                                                                     &n
bsp;&nb sp;

*Advantages
*a) Writer has a way to look at the command line options as atoms too.
b) Not sure if LTO could use this framework to call the compiler back with
the appropriate set of options.
c) Can use the ReaderWriterYAML framework for testing!

Thanks

Shankar Easwaran
**

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

This should be sent to the llvm-dev list.

Done.

I think we will need a model like this for input files in order to
correctly evaluate in the resolver, but I think this is the wrong way to go
about it.

We simply need a graph data structure that represents the input semantics
and provide mutable access to it throughout the link. So you're graph is
correct, I just wouldn't model it with Atoms. As for the command line
parser, the only part it has in this is generating the initial graph.

The only reason I modeled it with atoms was to rearrange them to increase locality of reference.
Different targets can optionally add command line passes to reorder them and optimize the command line to their needs.

The Darwin DAG would basically be a single GROUP node with all the inputs.

Ok.

Thanks

Shankar Easwaran