Building with LLVM_PARALLEL_XXX_JOBS

Hi,

I switched from "configure and make" to "cmake" build-system and
wanted to speedup my build.

In my build-script I use...

CMAKE_JOBS="1"
##CMAKE_JOBS=$(($(getconf _NPROCESSORS_ONLN)+1))
JOBS_CMAKE_OPTS="-DLLVM_PARALLEL_COMPILE_JOBS=$CMAKE_JOBS
-DLLVM_PARALLEL_LINK_JOBS=$CMAKE_JOBS"

[1] says in "LLVM-specific variables" section...

*** LLVM_PARALLEL_COMPILE_JOBS:STRING

Define the maximum number of concurrent compilation jobs.

*** LLVM_PARALLEL_LINK_JOBS:STRING

Define the maximum number of concurrent link jobs.

...whereas my configure-log says...

$ grep -i job -A1 logs/configure-log_llvm-toolchain-3.8.0rc3.txt
  Job pooling is only available with Ninja generators and CMake 3.0 and
  later.

configure-log_llvm-toolchain-3.8.0rc3.txt (14 KB)

build_llvm-toolchain.sh (3.7 KB)

Hi,

I switched from "configure and make" to "cmake" build-system and
wanted to speedup my build.

In my build-script I use...

CMAKE_JOBS="1"
##CMAKE_JOBS=$(($(getconf _NPROCESSORS_ONLN)+1))
JOBS_CMAKE_OPTS="-DLLVM_PARALLEL_COMPILE_JOBS=$CMAKE_JOBS
-DLLVM_PARALLEL_LINK_JOBS=$CMAKE_JOBS"

[1] says in "LLVM-specific variables" section...

*** LLVM_PARALLEL_COMPILE_JOBS:STRING

Define the maximum number of concurrent compilation jobs.

*** LLVM_PARALLEL_LINK_JOBS:STRING

Define the maximum number of concurrent link jobs.

...whereas my configure-log says...

$ grep -i job -A1 logs/configure-log_llvm-toolchain-3.8.0rc3.txt
Job pooling is only available with Ninja generators and CMake 3.0 and
later.
--
Job pooling is only available with Ninja generators and CMake 3.0 and
later.

My cmake-version is...

$ cmake --version
cmake version 2.8.12.2

So, I need Ninja *and* CMake >= v3.0 (or is the right CMake version
sufficient) to use LLVM_PARALLEL_XXX_JOBS?

Yes. When you run ninja without any argument you have parallelism by default. This options can control the default number of jobs for ninja.
Also, especially when doing LTO, you may want to limit the number of link jobs independently from the number of compile job, which is again a facility that ninja has.

If this is a fact, can you please adjust the information in [1]?

Do I have other options to speedup my build?

Yes: start to use ninja :wink:
If for any reason (?) you are really stuck with "make", then run "make -j ~ncpus" (with ~ncpus "approximately the number of cores in your machine). It will build in parallel.

Hi,

I switched from "configure and make" to "cmake" build-system and
wanted to speedup my build.

In my build-script I use...

CMAKE_JOBS="1"
##CMAKE_JOBS=$(($(getconf _NPROCESSORS_ONLN)+1))
JOBS_CMAKE_OPTS="-DLLVM_PARALLEL_COMPILE_JOBS=$CMAKE_JOBS
-DLLVM_PARALLEL_LINK_JOBS=$CMAKE_JOBS"

[1] says in "LLVM-specific variables" section...

*** LLVM_PARALLEL_COMPILE_JOBS:STRING

Define the maximum number of concurrent compilation jobs.

*** LLVM_PARALLEL_LINK_JOBS:STRING

Define the maximum number of concurrent link jobs.

...whereas my configure-log says...

$ grep -i job -A1 logs/configure-log_llvm-toolchain-3.8.0rc3.txt
Job pooling is only available with Ninja generators and CMake 3.0 and
later.
--
Job pooling is only available with Ninja generators and CMake 3.0 and
later.

My cmake-version is...

$ cmake --version
cmake version 2.8.12.2

So, I need Ninja *and* CMake >= v3.0 (or is the right CMake version
sufficient) to use LLVM_PARALLEL_XXX_JOBS?

Yes. When you run ninja without any argument you have parallelism by default. This options can control the default number of jobs for ninja.
Also, especially when doing LTO, you may want to limit the number of link jobs independently from the number of compile job, which is again a facility that ninja has.

If this is a fact, can you please adjust the information in [1]?

Do I have other options to speedup my build?

Yes: start to use ninja :wink:
If for any reason (?) you are really stuck with "make", then run "make -j ~ncpus" (with ~ncpus "approximately the number of cores in your machine). It will build in parallel.

So, I like NinjaS but I have not used it yet as software.

From my build-script...

[ -d ${BUILD_DIR} ] || mkdir -p ${BUILD_DIR}
cd $BUILD_DIR

# BUILD-VARIANT #1: CONFIGURE AND MAKE (will be DEPRECATED with LLVM v3.9)
##../llvm/configure $CONFIGURE_OPTS 2>&1 | tee $LOGS_DIR/$CONFIGURE_LOG_FILE
##$MAKE $MAKE_OPTS 2>&1 | tee $LOGS_DIR/$BUILD_LOG_FILE
##sudo $MAKE install 2>&1 | tee $LOGS_DIR/$INSTALL_LOG_FILE

# BUILD-VARIANT #2: CMAKE
$CMAKE ../llvm $CMAKE_OPTS 2>&1 | tee $LOGS_DIR/$CONFIGURE_LOG_FILE
$CMAKE --build . 2>&1 | tee $LOGS_DIR/$BUILD_LOG_FILE
##sudo $CMAKE --build . --target install 2>&1 | tee $LOGS_DIR/$INSTALL_LOG_FILE

You mean configuring with cmake and build (and install) with make?

$CMAKE ../llvm $CMAKE_OPTS 2>&1 | tee $LOGS_DIR/$CONFIGURE_LOG_FILE

$MAKE $MAKE_OPTS 2>&1 | tee $LOGS_DIR/$BUILD_LOG_FILE

sudo $MAKE install 2>&1 | tee $LOGS_DIR/$INSTALL_LOG_FILE

Which combination of cmake/ninja versions are you using (latest are
v3.4.3 and v1.6.0)?

- Sedat -

On linux (not Windows) I doubt using Ninja vs make will make drastic
differences.. (Others with actual numbers please chime in to correct
me)

/*
I think the difference could be more beneficial if you're doing
incremental builds, but I don't think that is what you're doing..
*/

On linux (not Windows) I doubt using Ninja vs make will make drastic
differences.. (Others with actual numbers please chime in to correct
me)

/*
I think the difference could be more beneficial if you're doing
incremental builds, but I don't think that is what you're doing..
*/

What do you mean by "incremental builds"?

/me wanted to test RCs of upcoming LLVM v3.8.0 - not doing daily builds.

I am building a "modern" Linux graphics driver stack (libdrm | mesa |
intel-ddx) and a Linux v4.4.y-LTS.

Things will be more modern when I switch from Ubuntu/precise
(12.04-LTS) to Ubuntu/xenial (upcoming 16.04-LTS, beta1 should be
available today according release-schedule).
( I am a bit sick of backporting software. )

- Sedat -

[1] XenialXerus/ReleaseSchedule - Ubuntu Wiki

By incremental I mean..
build the compiler

change some source file
rebuild
...

Which combination of cmake/ninja versions are you using (latest are
v3.4.3 and v1.6.0)?

With this combination I could reduce build-time down from approx. 3h
down to 01h20m.

$ egrep -i 'jobs|ninja' llvm-build/CMakeCache.txt
//Program used to build from build.ninja files.
CMAKE_MAKE_PROGRAM:FILEPATH=/opt/cmake/bin/ninja
//Define the maximum number of concurrent compilation jobs.
LLVM_PARALLEL_COMPILE_JOBS:STRING=3
//Define the maximum number of concurrent link jobs.
LLVM_PARALLEL_LINK_JOBS:STRING=1
CMAKE_GENERATOR:INTERNAL=Ninja

$ LC_ALL=C ls -alt logs/3.8.0rc3_clang-3-8-0-rc3_cmake-3-4-3_ninja-1-6-0/
total 360
drwxr-xr-x 2 wearefam wearefam 4096 Feb 25 19:58 .
drwxr-xr-x 6 wearefam wearefam 4096 Feb 25 19:58 ..
-rw-r--r-- 1 wearefam wearefam 130196 Feb 25 19:54
install-log_llvm-toolchain-3.8.0rc3.txt
-rw-r--r-- 1 wearefam wearefam 205762 Feb 25 19:51
build-log_llvm-toolchain-3.8.0rc3.txt
-rw-r--r-- 1 wearefam wearefam 14331 Feb 25 18:30
configure-log_llvm-toolchain-3.8.0rc3.txt

$ LC_ALL=C du -s -m llvm* /opt/llvm-toolchain-3.8.0rc3
315 llvm
941 llvm-build
609 /opt/llvm-toolchain-3.8.0rc3

- Sedat -

[1] https://cmake.org/files/v3.5/cmake-3.5.0-rc3-Linux-x86_64.tar.gz

There are a few notes I'd like to add to this thread.

(1) we have a number of places throughout out CMake build where we use features from newer CMakes gated by version checks. Most of these features are performance or usability related. None of them are correctness. Using the latest CMake release will often result in faster builds, so I encourage it.

(2) CMake's "install" target will pretty much always be slower from clean than the old autoconf/make "install" target. This is because in CMake "install" depends on "all", and our CMake builds more stuff in "all" than autoconf did. To help with this or CMake system has lots of convenient "install-${name}" targets that support component-based installation. Not every component has one of these rules, but if one you need is missing let me know. I also recently (r261681) added a new option (LLVM_DISTRIBUTION_COMPONENTS) that allows you to specify a list of components that have custom install targets. It then creates a new "install-distribution" target that wraps just the components you want. For Apple this is almost a 40% speed up over "ninja install".

-Chris

That sounds great, I want to use it!
It would even be more awesome with an description/example in docs/CMake.rst :slight_smile:

There are a few notes I'd like to add to this thread.

(1) we have a number of places throughout out CMake build where we use features from newer CMakes gated by version checks. Most of these features are performance or usability related. None of them are correctness. Using the latest CMake release will often result in faster builds, so I encourage it.

(2) CMake's "install" target will pretty much always be slower from clean than the old autoconf/make "install" target. This is because in CMake "install" depends on "all", and our CMake builds more stuff in "all" than autoconf did. To help with this or CMake system has lots of convenient "install-${name}" targets that support component-based installation. Not every component has one of these rules, but if one you need is missing let me know. I also recently (r261681) added a new option (LLVM_DISTRIBUTION_COMPONENTS) that allows you to specify a list of components that have custom install targets. It then creates a new "install-distribution" target that wraps just the components you want. For Apple this is almost a 40% speed up over "ninja install".

That sounds great, I want to use it!
It would even be more awesome with an description/example in docs/CMake.rst :slight_smile:

Once I get the last of the kinks worked out for our internal adoption I'm going to open source our config files that use it.

I've also made a note to remind myself to document it in docs/CMake.rst. I need to do a pass updating that with a bunch of the cool new things we're doing with CMake. Thanks for the reminder.

-Chris

For faster builds and rebuilds you should definitely read:
https://blogs.s-osg.org/an-introduction-to-accelerating-your-build-with-clang/
https://blogs.s-osg.org/a-conclusion-to-accelerating-your-build-with-clang/

Hope this helps!

Fabio, the work I was mentioning here is an extension beyond those blog posts.

Some details:

  • The “almost 40%” number I referred to is a multi-stage clang build. That means we build a host-capable compiler, then build the actual compiler we want to ship.
  • I’m at Apple, so points 1 and 2 are already covered (we only use clang, and ld64 is a fast linker).
  • Our system compiler is PGO+LTO’d, but our stage1 isn’t. Stage1 isn’t because the performance improvement of PGO+LTO is less than the time it takes to build, and stage1 is basically a throwaway.
  • We are using Ninja and CMake, but this configuration isn’t really significantly faster than autoconf/make, and actually “ninja install” is slower in my tests than the old autoconf “make install”. The slowdown is almost entirely due to Ninja’s “all” target being a lot bigger.
  • This performance is for clean builds, not incremental so ccache or shared libraries would not be a valid optimization
  • We do use optimized tablegen
  • “Build Less” is exactly what the LLVM_DISTRIBUTION_COMPONENTS enables, just in a friendly wrapper target.

-Chris

Hey Chris,

Sedat was asking for a way to “to speedup my build” and those blog posts were really helpful to me.
Anyway LLVM_DISTRIBUTION_COMPONENTS sounds very cool, hope you will push your code soon!

I got some more inspirations on how to speedup my build and integrated
the URLs into my scripts (attached).

For example to use GOLD as linker or to use '-O3' OptLevel maybe in
combination with LTO and PGO (using '-O3 -flto -fprofile-use').

Let's see when the v3.8.0 FINAL is released.

- Sedat -

build_llvm-toolchain_clang-cmake-ninja.sh (3.64 KB)

install_llvm-toolchain_clang-cmake-ninja.sh (3.15 KB)

LTO *will* slow down dramatically the build.

Building a binary with ‘LTO’, -O3, etc. will slow down the build. But the built binary could run much faster.
I am not sure what the intention is here.

- Fariborz

I had only a quick view on the blog-texts.

It might be that a CLANG generated with LTO/PGO speeds up the build.
Can you confirm this?

Can you confirm binutils-gold speed up the build?

Has LLVM an own linker?
Can be used? Speedup the build?

Yesterday night I loooked through available CMAKE/LLVM variables...

### GOLD
# CMAKE_LINKER:FILEPATH=/usr/bin/ld
# GOLD_EXECUTABLE:FILEPATH=/usr/bin/ld.gold
# LLVM_TOOL_GOLD_BUILD:BOOL=ON
### OPTLEVEL
# CMAKE_ASM_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
# CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
# CMAKE_C_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
### LTO
# LLVM_TOOL_LLVM_LTO_BUILD:BOOL=ON
# LLVM_TOOL_LTO_BUILD:BOOL=ON
### PGO
# LLVM_USE_OPROFILE:BOOL=OFF
#### TABLEGEN
# LLVM_OPTIMIZED_TABLEGEN:BOOL=OFF

So '-O3' is default for a RELEASE build.

Not sure which of the LTO variables are suitable, maybe both.

PGO? Is that the correct variable?

The blog-text mentioned to use optimized-tablegen.
Good? Bad? Ugly?

Thanks in advance for answering my questions.

Best regards,
- Sedat -

Hi Sedat,

It might be that a CLANG generated with LTO/PGO speeds up the build.
Can you confirm this?

Yes, a Clang host compiler built with LTO or PGO is generally faster than an -O3 build.

Some things to keep in mind when building the Clang host compiler:

GCC:
   - GCC 4.9 gives good results with PGO enabled (1.16x speedup over the -O3 build), not so much with LTO (actually regresses performance over the -O3 build, same for PGO vs PGO+LTO)
   - GCC 5.1/5.2/5.3 can't build Clang with LTO enabled (66027 – lto1: internal compiler error: in odr_types_equivalent_p), that's supposed to be fixed in GCC 5.4

Clang:
   - PGO works and gives a good 1.12x speedup over the -O3 build (produced about 270GB of profiling data when I tried this in December last year, this should be addressed soon once the in-process profiling data merging lands)
   - LTO provides a 1.03x speedup over the -O3 build
   - I have not tried LTO+PGO with full Clang bootstrap profiling data but I would expect that it helps to increase the performance even further

Can you confirm binutils-gold speed up the build?

Yes, gold is definitely faster than ld when building Clang/LLVM.

Has LLVM an own linker?
Can be used? Speedup the build?

I haven't tried it but lld can definitely link Clang/LLVM on x86-64 Linux.

The blog-text mentioned to use optimized-tablegen.
Good? Bad? Ugly?

Good, it helps to speed up debug builds.

Regards,

Tilmann

Hi Sedat,

It might be that a CLANG generated with LTO/PGO speeds up the build.
Can you confirm this?

Yes, a Clang host compiler built with LTO or PGO is generally faster than an
-O3 build.

Some things to keep in mind when building the Clang host compiler:

GCC:
  - GCC 4.9 gives good results with PGO enabled (1.16x speedup over the -O3
build), not so much with LTO (actually regresses performance over the -O3
build, same for PGO vs PGO+LTO)
  - GCC 5.1/5.2/5.3 can't build Clang with LTO enabled
(66027 – lto1: internal compiler error: in odr_types_equivalent_p), that's supposed to be
fixed in GCC 5.4

Clang:
  - PGO works and gives a good 1.12x speedup over the -O3 build (produced
about 270GB of profiling data when I tried this in December last year, this
should be addressed soon once the in-process profiling data merging lands)
  - LTO provides a 1.03x speedup over the -O3 build
  - I have not tried LTO+PGO with full Clang bootstrap profiling data but I
would expect that it helps to increase the performance even further

Can you confirm binutils-gold speed up the build?

Yes, gold is definitely faster than ld when building Clang/LLVM.

Has LLVM an own linker?
Can be used? Speedup the build?

I haven't tried it but lld can definitely link Clang/LLVM on x86-64 Linux.

The blog-text mentioned to use optimized-tablegen.
Good? Bad? Ugly?

Good, it helps to speed up debug builds.

[ CCed all folks who answered me ]

Hi,

I have built my llvm-toolchain v3.8.0 (FINAL) with binutils-gold v1.11
in a 1st run.

When building with cmake/ninja there are 150 "Linking" lines...

$ grep Linking logs/3.8.0_clang-3-8-0_cmake-3-4-3_ninja-1-6-0_gold-1-11_compile-jobs-2_link-jobs-1/build-log_llvm-toolchain-3.8.0.txt

wc -l

150

I have the following cmake-options realized...

*** SNIP ***

### CMAKE OPTIONS
# NOTE #1: CMake Version 2.8.8 is the minimum required(Ubuntu/precise
ships v2.8.7 officially)
# NOTE #2: For fast builds use recommended CMake >= v3.2 (used:
v3.4.3) and Ninja (used: v1.6.0)
# NOTE #3: How to check available cmake-options?
# EXAMPLE #3: cd $BUILD_DIR ; cmake ../llvm -LA | egrep $CMAKE_OPTS

build_llvm-toolchain_clang-cmake-ninja.sh (4.53 KB)

Attached a v2 of my build-script with "cmake options" pretty-fied.

- Sedat -

build_llvm-toolchain_clang-cmake-ninja_v2.sh (4.6 KB)