Linking tools

I’m trying to figure out exactly what the function and status of the different linking tools is. The impression I get is:

  1. For linking multiple bitcode (either binary or text format) files together, llvm-link is the current and future intended tool.

  2. For converting bitcode files into (machine code) object files, llc is the current and future intended tool.

  3. For linking multiple object files into an executable, llvm has hitherto relied on the system linker, but lld is the future intended tool.

Is any of this inaccurate or incomplete?

I’m trying to figure out exactly what the function and status of the different linking tools is. The impression I get is:

  1. For linking multiple bitcode (either binary or text format) files together, llvm-link is the current and future intended tool.

  2. For converting bitcode files into (machine code) object files, llc is the current and future intended tool.

llvm-link and llc are developer tools only. clang is designed to be the interface here (or your language specific driver).

  1. For linking multiple object files into an executable, llvm has hitherto relied on the system linker, but lld is the future intended tool.

This is correct.

-eric

Okay so for linking bitcode files together, what's the intended command?
That is, 'clang x.bc y.bc' will generate an executable, but for generating
a single larger bitcode file? Adding -emit-llvm gives an error unless you
also add -c, but that just rewrites the original small bitcode files?

I think your original description of the situation is accurate. llvm-link
will take multiple bitcode files and spit out a big ball o' bitcode, but
that's usually not sufficient for LTO, which is the main use case that we
want to support. From the perspective of LTO, we just want users to be able
to add -flto to their compile and link lines, and make that produce a
faster executable, without the user ever being aware of the bitcode.

If your use case (static analysis, maybe?) requires the intermediate
bitcode, we don't yet have a nice way to get that from clang and maybe we
should add one. Maybe -emit-llvm on a link line like you suggested, but
that discards information about non-bitcode object files.

Anyway, for now, llvm-link will do the job, but it isn't really meant to be
a user facing tool.

Hope that explains things. :slight_smile:

I think your original description of the situation is accurate. llvm-link
will take multiple bitcode files and spit out a big ball o' bitcode, but
that's usually not sufficient for LTO, which is the main use case that we
want to support.

To be clear I understand you: the reason it's usually not sufficient is
because most programs use build systems that don't really provide an
opportunity for such a step; they assume the compiler only needs to be told
about one source file at a time right up until machine code linking time?

From the perspective of LTO, we just want users to be able to add -flto to
their compile and link lines, and make that produce a faster executable,
without the user ever being aware of the bitcode.

Yes indeed. I understand there is work being done on achieving this by
following the usual build procedure, but essentially disguising bitcode
files as object files until link time?

If your use case (static analysis, maybe?) requires the intermediate

bitcode, we don't yet have a nice way to get that from clang and maybe we
should add one. Maybe -emit-llvm on a link line like you suggested, but
that discards information about non-bitcode object files.

Anyway, for now, llvm-link will do the job, but it isn't really meant to
be a user facing tool.

Right, I'm looking at both whole-program optimisation and static analysis.
But I suppose as you say, llvm-link should do the job for now.

Hope that explains things. :slight_smile:

It does, thanks!

In practice, it's not sufficient because there are usually pre-compiled
objects passed into the link step, and symbols from the bitcode are
referenced from those object files. Getting the precise list of symbols
that are actually referenced externally is a big part of the value of LTO.

For targets without GNU binutils and gcc driver support, has this goal
been achieved? The few times I've tried, Clang's hard-coded
dependencies on host GNU tools block LTO and linked binaries in
general. For my target, manually running llvm-link and llc is the
only way to get LTO-like output, but otherwise works pretty well.

For this same reason, I get worried when I hear maintainers state that
llvm-link, llc, llvm-mc, etc are developer only tools. GNUless
targets use these tools for production code for lack of working
alternatives.

If there's been recent progress on removing GNU dependencies, I'm all ears.

Cheers,
-steve

Right, yes, that's a very good point.

Basically, LTO for projects that have pre-compiled objects requires
integration with a real static linker. Currently we use plugins to
integrate with binutils linkers, Mac ld64, and some other closed-source
linkers. To cut this dependency, we need a new linker, which is what LLD is
intended to become.

Thanks and yes, I eagerly await Clang's switch to lld. Clang also
calls the gcc driver though, at least to invoke collect2. Will lld
allow clang to remove all GNU dependencies?