How/Why are libraries in LLVM_LINK_COMPONENTS ordered/repeated for correct linking ? How can I include a tool's dependency's dependences in it ?


I’d like to understand how CMake orders the libraries in LLVM_LINK_COMPONENTS for proper linking in the build command, and especially why it repeats some of them. This is in the context of a patch to link NVPTX libraries into bugpoint (D32003) for Polly when LLVM_POLLY_LINK_INTO_TOOLS=ON and POLLY_ENABLE_GPGPU_CODEGEN=ON.

This patch D31859 initialises the NVPTX backend within Polly when POLLY_ENABLE_GPGPU_CODEGEN=ON and requires the NVPTX libraries to be linked into the application or library that also links to Polly, e.g. opt, bugpoint. opt already links to the backend by adding ${LLVM_TARGETS_TO_BUILD} to LLVM_LINK_COMPONENTS. I had attempted to link NVPTX into bugpoint in similar ways (D32003#728913) and the build kept failing, before finding out that add_llvm_tool was linking the libraries in LLVM_LINK_COMPONENTS to bugpoint. It seemed that CMake was processing LLVM_LINK_COMPONENTS on bugpoint’s files before it did on Polly, and hence optimised out COMPONENTS (e.g. NVPTX) which weren’t required by bugpoint although required by Polly. Please correct me if I’m wrong here.

Is it possible to indicate to CMake to delay ordering the libraries in LLVM_LINK_COMPONENTS till Polly is linked to bugpoint ?
Or link Polly to bugpoint before the LINK_COMPONENTS ?
Or link NVPTX to bugpoint even if it didn’t contain any calls to the NVPTX back-end (like -Wl,–no-as-needed) ?

Previous versions of the patch had target_link_libraries( bugpoint LLVMNVPTXTargetInfo …) right after target_link_libraries( bugpoint Polly) which did the job, but I was asked find a less complicated way of accomplishing the same thing.

It was interesting to find to some of the libraries repeated in the build command. Can anyone help me understand why this is the case ?

Thank You,

Linker semantics vary significantly by platform. The behavior you are describing is required for correctness with standard Unix semantics.

A standard Unix linker proceses command line arguments in the order they are listed. When it processes object files it includes them in the image it is generating by resolving unresolved symbols against the object file, and adding the object file’s unresolved symbols to the list of symbols to be resolved. When a library or archive is encountered it uses that library to resolve only the existing unresolved symbols. It does not pull in all symbols from the library. This means that when libraries have dependencies on other libraries you often need to repeat them in order to link correctly. To illustrate this concept better, let me give a more concrete example.

Let’s say se have a software project where we have libraries FooBar.a (comprised of Foo.c and Bar.c), and Baz.a (comprised of Baz.c), and we are linking those libraries into a tool named FooBarBaz. If your sources looked something like:


int Foo() {
return Baz();


int Bar() {
return 1


int Baz() {
return Bar();


int main(…) {
printf(“%i”, Foo());
return 0;

Your linker command will need to look like:

ld FooBarBaz.o -lFooBar -lBaz -lFooBar

The reason is because when the linker processes FooBarBaz.o it sees the unresolved symbol Foo, it then sees FooBar, which provides an implementation of Foo (which gets pulled in), but has an unresolved reference to Baz. When Baz is pulled in you end up with an unresolved reference to Bar, so you need to revisit FooBar.

So, that’s the basics of why CMake is doing what it is doing. The how is a little bit more complicated.

At a very basic level CMake has a concept called interface dependencies. The interface dependencies of a target are the libraries that you need to link whenever you link that target. In the example above FooBar would have an interface dependency on Baz, and Baz would have an interface dependency on FooBar.

When CMake constructs the linker command line for FooBarBaz it adds the interface dependencies of each library after the library is visited, and this is applied transitively. So, FooBarBaz links FooBar, which depends on Baz, which depends on FooBar which depends on Baz… When CMAke detects a cycle like that it stops after one repetition (that is a configurable value though). This results in a CMake-generated linker line being something like:

ld FooBarBaz.o -lFooBar -lBaz -lFooBar -lBaz

The second listing of Baz is not strictly required, but it is very hard to detect that in advance so CMake takes a conservative approach.

WRT the delaying question. Not really. CMake doesn’t expose direct control over interface library dependencies, and this is largely as a conservative way of ensuring deterministic and correct builds. Because it is possible that multiple libraries could provide different implementations of the same symbols (think link-once-ODRs), the ordering of libraries and the relation to each other impacts the correctness of the output. CMake’s approach is probably overly conservative for most situations, but without changing the behavior of CMake (the tool, not our config scripts), I don’t believe you can change this.

One other thing to point out is that we actually have two mechanisms for feeding library dependencies in LLVM. In CMake we specify the library dependencies, but we also specify the dependencies in the LLVMBuild.txt files which feed llvm-config. We also have a script that runs during CMake configuration that brings in LLVMBuild’s dependency graph and we also incorporate that into our linker dependencies.

Hope this answers your questions.