[llvm-toolchain v3.8.1] LTO: Linking clang hangs with ld.gold and LLVMgold.so plugin

Hi,

unfortunately, my build somehow hangs when linking clang binary and my
system is in an unusable state.

My toolchain is clang-3.8, gold-1.11 and LLVMgold.so from binutils
v2.26.1 (both selfmade) and LTO-flag is enabled.
My buildsystem uses cmake-3.6.0 and ninja-1.7.1 (both prebuilt).
I have 52 last steps left in my 3rd build.

My Linux-kernel is v3.13.0-92 from official Ubuntu repositories.

On my Ubuntu/precise AMD64 I have ***4GiB RAM and 256MiB SWAP***.
Is this not enough RAM/SWAP for building/linking?

Need some additional informations?

Any help appreciated.

Regards,
- Sedat -

[1] http://llvm.org/docs/GoldPlugin.html

P.S.: More informations...

[ First toolchain build ]

I built a 1st llvm-toolchain v3.8.1 with GCC v4.9.2 and binutils v2.22.
No speedup settings, no extra-patches applied etc.

[ Selfmade binutils ]

As binutils v2.22 was somehow not able to generate a LLVMgold.so on
Ubuntu/precise AMD64, I have built binutils v2.26.1 manually and use
all its binaries for building/linking (GOLD for linking).

[ 2nd llvm-toolchain LLVMgold.so ]

In a 2nd llvm-toolchain v3.8.1 build a LLVMgold.so was generated.
I placed LLVMgold.so into my selfmade
/opt/binutils-2.26.1/lib/bfd-plugins/ directory.
( I had to create a bfd-plugins subdir manually. )

[ 3rd llvm-toolchain LTO ]

Now, I am able with backported LTO-flag from upstream to build with '-flto'.

- EOT -

That's often not enough for a non-shared-library non-LTO debug build.
I don't expect LTO non-shared-library to fair better in that regard...

Joerg

Hi,

in the meantime I tried with Linux v4.4.y LTS and 2GiB swap-space.
So I have 4 GiB RAM and 2GiB SWAP in total 6GiB.

Paul Rouschal recommended to reduce parallel-compile-jobs from 2 to 1...

LLVM_PARALLEL_COMPILE_JOBS=1
LLVM_PARALLEL_LINK_JOBS=1

...but that did not help.

My Ubuntu/precise hangs and looking at top shows MEM/SWAP to be eaten.

Anyone has experiences how much RAM or SWAP I need when building a
LTO-optimized llvm-toolchain with Clang, GNU/gold and LLVMgold-Plugin?

My build-script is attached.

Thanks.

Regards,
- Sedat -

build_llvm-toolchain.sh (6.04 KB)

How big is your project?
LTO eats RAM even faster than chrome. For example linking clang with LTO could take 16GB of ram.

Have you tried using LTO on your project on that machine, or is it your first time?

Piotr

How big is your project?
LTO eats RAM even faster than chrome. For example linking clang with LTO
could take 16GB of ram.

Have you tried using LTO on your project on that machine, or is it your
first time?

Wow hu hu - 16GiB of RAM :-).

This is my first time dealing with LTO in general.

Not sure if it matters if I build with GCC and its LTO-plugin and
ld.bfd as linker.
Here I tried CLANG with LLVMgold-plugin and ld.gold (v2.26.1).

These are the last lines I see...

[2114/2157] Linking CXX executable bin/diagtool
[2115/2157] Linking CXX executable bin/clang-format
[2116/2157] Linking CXX executable bin/clang-3.8
FAILED: bin/clang-3.8
: && /opt/llvm/bin/clang++-3.8 -fPIC -fvisibility-inlines-hidden
-Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wmissing-field-initializers -pedantic -Wno-long-long
-Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor
-std=c++11 -fcolor-diagnostics -ffunction-sections -fdata-sections
-flto -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3
-flto -Wl,-allow-shlib-undefined -Wl,--export-dynamic -Wl,-O3
-Wl,--gc-sections
tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o
tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o -o
bin/clang-3.8 lib/libLLVMX86CodeGen.a lib/libLLVMX86AsmPrinter.a
lib/libLLVMX86AsmParser.a lib/libLLVMX86Desc.a lib/libLLVMX86Info.a
lib/libLLVMX86Disassembler.a lib/libLLVMAnalysis.a
lib/libLLVMCodeGen.a lib/libLLVMCore.a lib/libLLVMipo.a
lib/libLLVMInstCombine.a lib/libLLVMInstrumentation.a lib/libLLVMMC.a
lib/libLLVMMCParser.a lib/libLLVMObjCARCOpts.a lib/libLLVMOption.a
lib/libLLVMScalarOpts.a lib/libLLVMSupport.a
lib/libLLVMTransformUtils.a lib/libLLVMVectorize.a lib/libclangBasic.a
lib/libclangCodeGen.a lib/libclangDriver.a lib/libclangFrontend.a
lib/libclangFrontendTool.a lib/libLLVMAsmPrinter.a
lib/libLLVMSelectionDAG.a lib/libLLVMCodeGen.a
lib/libLLVMX86AsmPrinter.a lib/libLLVMX86Utils.a lib/libLLVMX86Info.a
lib/libLLVMMCDisassembler.a lib/libclangCodeGen.a lib/libLLVMipo.a
lib/libLLVMVectorize.a lib/libLLVMInstrumentation.a
lib/libLLVMObjCARCOpts.a lib/libLLVMScalarOpts.a
lib/libLLVMInstCombine.a lib/libLLVMTarget.a lib/libLLVMBitWriter.a
lib/libLLVMIRReader.a lib/libLLVMAsmParser.a lib/libLLVMLinker.a
lib/libLLVMTransformUtils.a lib/libLLVMAnalysis.a
lib/libLLVMProfileData.a lib/libLLVMObject.a
lib/libclangRewriteFrontend.a lib/libclangARCMigrate.a
lib/libclangStaticAnalyzerFrontend.a lib/libclangFrontend.a
lib/libclangDriver.a lib/libLLVMOption.a lib/libclangParse.a
lib/libLLVMMCParser.a lib/libclangSerialization.a
lib/libLLVMBitReader.a lib/libclangSema.a lib/libclangEdit.a
lib/libclangStaticAnalyzerCheckers.a lib/libclangStaticAnalyzerCore.a
lib/libclangAnalysis.a lib/libclangAST.a lib/libclangRewrite.a
lib/libclangLex.a lib/libclangBasic.a lib/libLLVMCore.a
lib/libLLVMMC.a lib/libLLVMSupport.a -lrt -ldl -ltinfo -lpthread -lz
-lm -Wl,-rpath,"\$ORIGIN/../lib" && :
clang-3.8: error: unable to execute command: Killed
clang-3.8: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed.

This time I just build and installed ld.gold from binutils v2.26.1 and
symlinked system-wide with my ld of my /opt-installation.
The generated LLVMgold.so plugin I copied into /usr/lib/bfd-plugins/
directory (bfd-plugins subdir needed to be created manually).

- Sedat -

P.S.: Output of free and df commands

$ free -m
             total used free shared buffers cached
Mem: 3846 2211 1635 0 729 818
-/+ buffers/cache: 662 3183
Swap: 255 143 112

$ df -T | grep ext4
/dev/loop0 ext4 17753424 15997332 831216 96% /

- EOT -

build_llvm-toolchain.sh (6.24 KB)

It is a bit less now: should be closer to 11GB with Full Debug Info.
You may want to try without debug info (or with -gline-tables-only) and it should get under 4GB.

How big is your project?
LTO eats RAM even faster than chrome. For example linking clang with LTO
could take 16GB of ram.

Have you tried using LTO on your project on that machine, or is it your
first time?

Wow hu hu - 16GiB of RAM :-).

It is a bit less now: should be closer to 11GB with Full Debug Info.
You may want to try without debug info (or with -gline-tables-only) and it
should get under 4GB.

Hmm, I have here a built-in small SanDisk-SSD with 16GiB and used that as SWAP.

I was able to build but when starting the install procedure I had 1916
steps again for building/linking.

When I looked while watching TV the max SWAP I have seen was 3,5GiB in
top (linking clang-3.8).

So, build-time increases rapidly here.

What is the benefit of "-gline-tables-only"?
Does it have any correlation/problem with the below cmake-options?

In a next run I wanted to enable Profile-Guided Optimizations (PGO),
optimized-TableGen and split-DWARF.

# CMake LTO (link-time optimizations) settings (WIP)
# NOTE: Not available for LLVM v3.8 (requires a backport of SVN r259766)
LTO_CMAKE_OPTS="-DLLVM_ENABLE_LTO=ON"

# CMake PGO (profile-guided optimizations) settings (WIP)
PGO_CMAKE_OPTS="-DLLVM_USE_OPROFILE=ON"

# CMake TABLEGEN settings (WIP)
TABLEGEN_CMAKE_OPTS="-DLLVM_OPTIMIZED_TABLEGEN=ON"

# CMake DWARF settings (WIP)
# NOTE: Move parts of the debug info into separate files: split DWARF
DWARF_CMAKE_OPTS="-DLLVM_USE_SPLIT_DWARF=ON"

SPEEDUP_CMAKE_OPTS="$LTO_CMAKE_OPTS $PGO_CMAKE_OPTS $TABLEGEN_CMAKE_OPTS

Thanks for the feedback.

- sed@ -

BTW, I use here...

[ build_llvm-toolchain.sh ]
...
# CMake binutils binaries and linker options (here: GNU/ld or
alternatively GNU/gold from selfmade binutils v2.26.1)
# NOTE: Build the LLVMgold-plugin where $BINUTILS_INC_DIR contains
plugin-api.h file
BINUTILS_BIN_DIR="/opt/binutils/bin"
BINUTILS_INC_DIR="/opt/binutils/include"
...
# Use selfmade GNU/gold as linker
LINKER_GOLD="$BINUTILS_BIN_DIR/ld.gold"
LINKER="$LINKER_GOLD"
LINKER_CMAKE_OPTS="-DCMAKE_LINKER=$LINKER -DGOLD_EXECUTABLE=$LINKER_GOLD"
LINKER_CMAKE_OPTS="$LINKER_CMAKE_OPTS -DLLVM_BINUTILS_INCDIR=$BINUTILS_INC_DIR"
...

Unfortunately, I see that the system-wide linker is used even I have
explitly set "-DCMAKE_LINKER=$LINKER".
Checking the settings look good.

$ cd llvm-build/
$ cmake ../llvm -LA | egrep -i 'binutils|gold'
CMAKE_LINKER:FILEPATH=/opt/binutils/bin/ld.gold
GOLD_EXECUTABLE:FILEPATH=/opt/binutils/bin/ld.gold
LLVM_BINUTILS_INCDIR:PATH=/opt/binutils/include
LLVM_TOOL_GOLD_BUILD:BOOL=ON

This is no problem here as I have...

$ LC_ALL=C ls -l /usr/bin/ld /opt/binutils/bin/ld
lrwxrwxrwx 1 root root 7 Jul 23 13:35 /opt/binutils/bin/ld -> ld.gold*
lrwxrwxrwx 1 root root 20 Jul 23 13:42 /usr/bin/ld -> /opt/binutils/bin/ld*

I will place my selfmade ld.gold to /usr/bin/ and rename the original one.

# cd /usr/bin/
# mv ld.gold ld.gold-2.22
# mv /opt/binutils/bin/ld.gold /usr/bin/ld.gold-2.26.1
# rm -v -f ld
# ln -sf ld.gold-2.26.1 ld

My LLVMgold-plugin is embedded like this...

$ LC_ALL=C ll /usr/lib/bfd-plugins/LLVMgold.so
lrwxrwxrwx 1 root root 25 Jul 23 16:13
/usr/lib/bfd-plugins/LLVMgold.so -> /opt/llvm/lib/LLVMgold.so

Anyone can tell me why my explicitly setting $LINKER does not work as expected?

- sed@ -

WTF-ld.txt (2.15 KB)

WTF-ld-2.txt (1.73 KB)

WTF-ld-3.txt (1.32 KB)

What are you recommending?

I am normally doing RELEASE builds.

"Using Sampling Profilers" [2] is a sub-section of "Profile Guided
Optimization" which says...

#1: Build the code with source line table information. You can use all
the usual build flags that you always build your application with. The
only requirement is that you add -gline-tables-only or -g to the
command line. This is important for the profiler to be able to map
instructions back to source line locations.

$ clang++ -O2 -gline-tables-only code.cc -o code

...currently, I do not have PGO enabled via cmake-options.

How to pass "-gline-tables-only" to the *_FLAGS_* cmake-options?

$ cmake ../llvm -LA | egrep 'FLAGS_DEBUG:' | sort
CMAKE_ASM_FLAGS_DEBUG:STRING=-g
CMAKE_C_FLAGS_DEBUG:STRING=-g
CMAKE_CXX_FLAGS_DEBUG:STRING=-g
CMAKE_EXE_LINKER_FLAGS_DEBUG:STRING=
CMAKE_MODULE_LINKER_FLAGS_DEBUG:STRING=
CMAKE_SHARED_LINKER_FLAGS_DEBUG:STRING=
CMAKE_STATIC_LINKER_FLAGS_DEBUG:STRING=

$ cmake ../llvm -LA | egrep 'FLAGS_RELEASE:' | sort
CMAKE_ASM_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_EXE_LINKER_FLAGS_RELEASE:STRING=
CMAKE_MODULE_LINKER_FLAGS_RELEASE:STRING=
CMAKE_SHARED_LINKER_FLAGS_RELEASE:STRING=
CMAKE_STATIC_LINKER_FLAGS_RELEASE:STRING=

Not sure, is there a cmake-options available to set "-gline-tables-only"

$ cmake ../llvm -LA | egrep -i 'table|line|pgo|prof' | sort
CLANG_EXECUTABLE_VERSION:STRING=3.8
CLANG_TABLEGEN:STRING=clang-tblgen
GO_EXECUTABLE:FILEPATH=GO_EXECUTABLE-NOTFOUND
GOLD_EXECUTABLE:FILEPATH=/opt/binutils/bin/ld.gold
LIBXML2_XMLLINT_EXECUTABLE:FILEPATH=/usr/bin/xmllint
LLVM_OPTIMIZED_TABLEGEN:BOOL=OFF
LLVM_PROFDATA_FILE:FILEPATH=
LLVM_TABLEGEN:STRING=llvm-tblgen
LLVM_TOOL_LLVM_PROFDATA_BUILD:BOOL=ON
LLVM_USE_OPROFILE:BOOL=OFF
PKG_CONFIG_EXECUTABLE:FILEPATH=/usr/bin/pkg-config
PYTHON_EXECUTABLE:FILEPATH=/usr/bin/python2.7

Many questions, I know...

Thanks.

- sed@ -

[1] Clang Compiler User’s Manual — Clang 16.0.0git documentation
[2] http://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers