RFC: auto-generating build dependency file from lld

Hi,

I’d like to propose a new feature and a flag (--write-dependencies=<path>) for lld so that the linker can generate a dependency file (.d file). This is analogous to -MD compiler flag.

Background:
Clang and GCC have a feature (-MD flag) to create a dependency file in a format that make and other build tools can read, so that you don’t have to manually maintain dependencies between .c files and .h files. There’s no similar feature for the linker, even though it seems useful in some situations.

In particular, if a compiler driver automatically appends a static library to the final executable but you don’t know the exact path of the library, there’s currently no way to keep track of that dependency. A typical example of it is -fsanitizer=asan which adds libasan to the linker command line. If libasan is updated, you may want to rebuild your program, but you don’t want to manually write its path to a build file because that may change.

Proposal:
Add a new command line flag --write-dependencies=<path> to lld. If the flag is given, lld creates a file at a given path with the following contents:

: …

where is a pathname of an output file and … is a list of pathnames of all input files. This file format is the same as the -MD compiler flag output.

Here is a change to implement the above feature: https://reviews.llvm.org/D65430

Any comments?

Thanks,
Rui

I love this feature. Does it plan to support COFF as well?

Thanks

Steven

Yeah, I think there’s no reason to not add this to lld/COFF if people find it useful.

Very nice. It can directly help me know which lib and obj file is redundant in my linker script.

BTW, besides the lib and file level dependency, is it possible to further print the function and global variable level dependency? E.g. the really linked symbols before any link level optimization. The fine granularity dependency could help me clean the redundant code and more accurately select regression test case in CI.

Thanks

Steven

Yes, it’s a little bit off-topic, but it is also planned. The data structure that the linker handles can be considered a large graph where vertices are file sections and edges are symbol names. You can say that file A depends on file B if and only if in the graph a section in file A has an edge to a section in file B. There is a plan to dump the graph in a machine-readable format such as JSON so that you can run arbitrary graph analysis algorithms on it.

On the COFF side, I think /verbose already gives Reading ... logs. Users can parse the filenames.
It probably makes sense to add a fine-grained option /trace as per ELF (-t --trace).

Yes, it's a little bit off-topic, but it is also planned. The data structure that the linker handles can be considered a large graph where vertices are file sections and edges are symbol names. You can say that file A depends on file B if and only if in the graph a section in file A has an edge to a section in file B. There is a plan to dump the graph in a machine-readable format such as JSON so that you can run arbitrary graph analysis algorithms on it.

That would be very useful.

-Hal

On the COFF side, I think /verbose already gives Reading ... logs. Users can parse the filenames.
It probably makes sense to add a fine-grained option /trace as per ELF (-t --trace).

That is also true for lld/ELF and perhaps even for the compilers. If you enable logging, it must contain input filenames, and you can create a dep file from it. So the point is convenience – you don’t have to parse linker logs to create a dep file (and that’s true for compiler’s -MD option, I think).