Why doesn't my compilation cache when changing linker?

Hi all!

This is probably going to be a very noobish question, so I apologize in advance.

I want to work on LLVM for my thesis, so I forked the repo, cloned it and now I want to compile it. I issued

$ mkdir build
$ cd build
$ cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Debug -G "Ninja" ../llvm
$ cd ..
$ cmake --build ./build

and it started working. Compilation reached [3097/4621] Linking CXX shared library lib/libLTO.so.16git, then as it often happens it threw an error due to low memory.

With a bit of Google-fu, I switched linker to gold with

$ cmake -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Debug -DLLVM_USE_LINKER=gold -G "Ninja" ../llvm

and relaunched the compilation

$ cmake --build ./build

However, now the compilation started from [1/3483] Linking CXX executable bin/llvm-PerfectShuffle, so with more work to do than as it would have had if it resumed from where it stopped. In other words, the first compilation aborted with 4621-3097=1524 jobs left to do, but the second one restarted with 3483 jobs left: why? Is this normal when changing linkers? Did I misunderstand something related to the building process (it’s very likely)? Or did I issue the commands wrong, and I’m wasting resources?
Depending on the answers to these question, is a tool like ccache needed?

Thanks!

llvm/cmake/modules/HandleLLVMOptions.cmake handles the option you found and you’ll see that it add -fuse-ld which is a compiler option not a linker option. Which makes me think that CMAKE_EXE_LINKER is actually the compiler (clang or gcc) not the linker directly.

Which I think is a pretty common thing to do, for reasons I forget right now but they do exist. One could be that a compiler has knowledge of all the target specific directories that might be needed, so you don’t have to pass a bunch of paths manually to the linker.

Point being, because the compiler options have changed, I’d expect the objects to be rebuilt. Unless cmake is smart enough to know that that is only a linker option, even if we’re using the compiler as the linker (as the driver for the linker really).

Could be wrong, maybe cmake (/the generated ninja or make) is clever enough to know that.

As for ccache I don’t think it would help this specific issue. It is generally a good thing to look at though. Especially if you rebuild a lot of different versions of llvm. (pro tip: the default cache size might be too small, especially for flang)

If you look in your generated make/ninja for the -fuse-ld flag (literally grep the build dir for the flag name) you might be able to see how it’s used and work it out from that also.

Thanks for your help!

So, IIUC, using -DLLVM_USE_LINKER=gold exports in the build scripts -fude-ld=gold, which in turn tells Ninja to use the gold linker, right? But in this file (HandleLLVMOptions.cmake) I cannot see any use of CMAKE_EXE_LINKER, only of CMAKE_EXE_LINKER_FLAGS. Does this matter anything?

Yes, I’ve seen this as actually the de facto standard, tho I couldn’t say either why it should be better.

Well, given the outputs I’m getting, can we say it isn’t, right?

Ah, I see your point. That’s because, when building multiple versions, Make/Ninja alone can’t use .o files as cache, because they are different between versions, right? Anyway, in my use case I only want to build (and modify!) one version (the one for my native system): in this case, would you say that is it useless, as Make/Ninja can figure it out? Or does ccache do a more clever caching, for instance during linking?

It’s not a problem as such. Just means we’re going to use whatever cmake defaults to which appears to be using the compiler to drive the linker. When cmake generates those command lines, it’ll use those LINKER_FLAGS.

Yes. Or it is being pessimistic because there is some niche circumstance where there is an issue, and people don’t change linkers often enough to warrant the risk.

Exactly. So if you were switching between say, 15 and 14, ccache could hold objects from both versions. If you’re just checking out the next few commits, the local objects should do the job.

I think. There are some cases where cmake doesn’t describe the depenency right and ccache saves the day but I wouldn’t worry about that stuff.

In this case with one version and small incremental changes, let make/ninja figure it out.

And by the way, switching to gold is one option but you can also use llvm’s linker lld.

Building LLVM with CMake — LLVM 16.0.0git documentation shows it in its example. You may have to apt install lld in addition to clang if you want to do that.

The benefit there is that lld is actively maintained, and I personally have had gold crash on me building llvm on some Ubuntu versions. Anyway, if you have problems with gold that’s your next stop.

Also, do you need a debug build? A debug build is going to be much larger than release.

For development of llvm you don’t always need debug builds. Release with asserts enabled is a nice compromise that I generally use.

https://llvm.org/docs/CMake.html#llvm-enable-assertions

Moreover, in case it is useful, grepping the build dir for -fuse-ld (even if the compilation is still going on) I found:

  • In build.ninja, it occurs always in lines of the form LINK_FLAGS = -fuse-ld=gold [...]
  • In AddLLVM.cmake, it occurs as
  if(LLVM_USE_LINKER)
    set(command ${CMAKE_C_COMPILER} -fuse-ld=${LLVM_USE_LINKER} ${version_flag} -o ${DEVNULL})

thus being used in the compiler as you were saying.

  • In HandleLLVMOptions.cmake, I see:

if( LLVM_ENABLE_LLD )
  if ( LLVM_USE_LINKER )
    message(FATAL_ERROR "LLVM_ENABLE_LLD and LLVM_USE_LINKER can't be set at the same time")
  endif()
  # In case of MSVC cmake always invokes the linker directly, so the linker
  # should be specified by CMAKE_LINKER cmake variable instead of by -fuse-ld
  # compiler option.
  if ( NOT MSVC )
    set(LLVM_USE_LINKER "lld")
  endif()
endif()

if( LLVM_USE_LINKER )
  append("-fuse-ld=${LLVM_USE_LINKER}"
    CMAKE_EXE_LINKER_FLAGS CMAKE_MODULE_LINKER_FLAGS CMAKE_SHARED_LINKER_FLAGS)
  check_cxx_source_compiles("int main() { return 0; }" CXX_SUPPORTS_CUSTOM_LINKER)
  if ( NOT CXX_SUPPORTS_CUSTOM_LINKER )
    message(FATAL_ERROR "Host compiler does not support '-fuse-ld=${LLVM_USE_LINKER}'")
  endif()
endif()
  • In DiagnosticDriverKinds.inc:
[...]
DIAG(warn_drv_fuse_ld_path, CLASS_WARNING, (unsigned)diag::Severity::Ignored, "'-fuse-ld=' taking a path is deprecated; use '--ld-path=' instead", 330, SFINAE_Suppress, false, false, true, false, 0)
[...]
DIAG(err_drv_lto_without_lld, CLASS_ERROR, (unsigned)diag::Severity::Error, "LTO requires -fuse-ld=lld", 0, SFINAE_SubstitutionFailure, false, true, true, false, 0)
[...]
  • Finally, in Options.inc:
OPTION(prefix_1, &"-fuse-ld="[1], fuse_ld_EQ, Joined, f_Group, INVALID, nullptr, CoreOption | LinkOption, 0, nullptr, nullptr, nullptr)

This is what I was referring to. Now you’ve seen that you can be reasonably confident that it’s passed as a compiler flag but only when the compiler is being used as the linker exe.

You’d be amazed at what you can figure out just searching files.

1 Like

Thanks for your reply!

Great, I was thinking the same too. Thanks!

Well, I didn’t considered that. I suppose I will need to make sure to use the “system” lld, and not the one I build with LLVM+Clang, right? Anyway, I’ll probably give it a try, thanks!

Well, I’m still figuring out how LLVM works, and IIUC the Debug release is needed to have code navigation features, like “jump to definition” and similar, right? Or is that completely external and not dependent on debug info (like symbol table, if I am right…?)? Or even, I would say that if I need to debug an optimization pass, I’ll need that, right? Or is it madness to use something like gdb on LLVM? In other words, to inspect what it does, the idiomatic way is to use a debugger, or something else (like a heavy use of assertions)?

Great… partially haha. Anyway, I trust the dev team, if this is how they want it to be compiled, there must be a reason. :+1:

I don’t think this was mentioned, but because almost everything depends on generated headers by TableGen, I suspect changing the linker forces to relink the TableGen binary, which will in turns re-generate all tablegen-generated files and rebuild “everything”.

Also: you may want to add -DLLVM_CCACHE_BUILD=ON to your CMake invocation (and install ccache if you don’t have it), it helps greatly with rebuilds.

That’s surely possible, but I’m really a beginner, I just Googled few hours ago what TableGen was :sweat_smile:

@DavidSpickett and I discussed this before: since I need only to modify and compile a single version, I think it’s not useful, right?

It still does, the most simple example is: try to change a comment in a widely used header (like SmallVector.h) and see how long it takes to rebuild with and without ccache.

1 Like

Yeah, I shouldn’t have given the impression there’s zero benefit to ccache. It’s very easy to setup so why not give it a go.

Just don’t get too into the build system when your goal is to spend time learning llvm, unless the build times are holding you back.

1 Like

Ah, I see now, thanks! I’ll investigate it then :+1: