interesting note on build times: configure/make vs. cmake/ninja

Hi all,

Just thought I’d pass along that I’ve just did a build time comparison between our standard lldb build setup: configure + (g)make (i.e. configure && make -j32), vs. cmake + ninja (i.e. cmake -GNinja && ninja). On an HP z620 with 32 virtual procs on ssd, it takes me just under 10 minutes to build lldb with configure/make. The same machine with ninja + cmake takes 4.25 minutes. huge speedup.

We’ll be moving in that direction on our dev setups based on that change. If for some reason we hit any hiccups with that, I’ll be sure to tell you about them.

I think the configure-based build was (temporarily) broken when I
first started working on LLDB and I started using cmake/ninja for that
reason; I've never looked back. The speedup is very nice, and I'm
quite fond of ninja's output format for files that are built
successfully / have warnings / have errors.

There's been some discussion in the past on deprecating the
configure+make build, but without any further action. For developers
still using the configure+make build, is it just because you're
unfamiliar with cmake/ninja, or is there some attribute of the make
build that you rely on?

Another few notes for Ubuntu 12.04 users:

  • I did need to custom build a newer cmake. Specifically, I built version cmake-2.8.12.2.
  • Do not make the mistake I apparently did where I tried ‘sudo apt-get install ninja’ and got something else installed on my system. I ended up building ninja from the git repository. Specifically, from git://github.com/martine/ninja.git, from commit 84986af6fdeae3f649f2bf884b20f644bc370e48 from Thu Jan 23 08:29:38 2014 -0800. If you hit some silliness about not being able to run your C compiler and trying to run ninja from /usr/sbin/ninja, start a new cmake directory since cmake cached the wrong ninja.
    The cmake command I ran is this:

/usr/local/cmake/cmake-current/bin/cmake -GNinja -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DLLVM_ENABLE_CXX11=ON -DCMAKE_CXX_FLAGS=-I$HOME/lldb/tools/libedit/include -DCMAKE_EXE_LINKER_FLAGS=-L$HOME/lldb/tools/libedit/linux_x86-64/lib …/llvm

  • I have a gcc 4.8.2 in my path, and I need to tell cmake to use the g++/gcc from my path rather than going straight for /usr/bin/gcc and /usr/bin/g++.

  • I need to tell the compiler where to find our newer libedit. You may recall that the stock Ubuntu 12.04 libedit is not new enough for the current code we use, so we have a separate libedit that we built from a configure-enabled build.

  • Note Ubuntu 13.10 does not need a newer gcc/g++ - 4.8.1 is fine, as is the libedit-dev package that it includes. If you’re building there (or newer), you can drop off the -DCMAKE_*_COMPILER flags and the libedit tweaks.

Running ninja looks like this:

ninja

ninja automatically uses as many processors as you have, so no need to guess a good -j number as in (g)make.

Running the tests looks like this:

ninja check-lldb

I am now looking at a few failures that I got when I ran ninja check-lldb: they might be related to the way I changed the build, so I’m first going to verify if they show up with make/configure.

OK (skipped=1, expected failures=1)
Ran 276 tests.
Failing Tests (5)
FAIL: LLDB (suite) :: TestSTTYBeforeAndAfter.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestSingleQuoteInFilename.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestCompletion.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestCommandRegex.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestConvenienceVariables.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
ninja: build stopped: subcommand failed.

That last one has been failing intermittently for me (it’s failed about 3 times this week, which tells me the python ref count issue I fixed last week maybe didn’t totally clear that up). The rest look new to me.

Hmm.

Okay, so we have some work to do on getting ninja to run tests faster.

Running tests with configure/(g)make takes 9.5 minutes on my system. That’s via ‘make -C tools/lldb/test’.

Running tests via the ‘ninja check-lldb’ takes a whopping 25.5 minutes. Youch! That totally erases the build time speedup many times over. I’m pretty sure we can make that better. The ninja test runs starts off by mentioning that it is running each test in a separate process, not sure if that’s related. Anybody have any thoughts on what is making this run so much slower? Both are running 276 tests, so AFAICT neither one is doing more overall work than the other.

Also, I’m getting a smaller set of tests failing in the configure/(g)make test run:

OK (skipped=1, expected failures=1)
Ran 276 tests.
Failing Tests (2)
FAIL: LLDB (suite) :: TestConvenienceVariables.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestAbbreviations.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
make: *** [check-local] Error 1
make: Leaving directory `/mnt/ssd/work/svn/lgs/build/tools/lldb/test’

The first one is that intermittent failure I mentioned. I need to check on the second one.

Here are the ones that are unique to the ninja test run:

FAIL: LLDB (suite) :: TestSTTYBeforeAndAfter.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestSingleQuoteInFilename.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestCompletion.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)
FAIL: LLDB (suite) :: TestCommandRegex.py (Linux tfiala2.mtv.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64)

That’s all I got right now.

Just a though, have you checked the memory consumption?
If ninja starts many test processes, you may reach the memory limit and start to swap. This would explain the time increase.

Just a side note about my experience with the test suite.
Actually I’m running it by invoking the dotest script directly. I found that from time to time, some tests fail due to a race condition in the process life-cycle management:
The test suite try to kill the process while it is resuming. Resuming occurs on the process state listener thread, and the kill occurs on the driver thread.
What append is that the Process::Destroy method clears all thread plans while some other code is trying to configure each thread for resuming, which produce a null dereference in the ThreadList class.

If you have some intermittent failures, it may be cause by such issue.

Ah, okay, thanks Jean-Daniel.

Just a though, have you checked the memory consumption?

Nope - I hadn’t checked that. Good idea, I’ll have a look. I’ve got 64 GB RAM so I’m not expecting that to be the issue, but it’s definitely worth checking. It would be entertaining if I saw 500 tests attempting to run at the same time :slight_smile:

I found that from time to time, some tests fail due to a race condition in the process life-cycle management:

Thanks! Typically I run each failed test directly after the full suite to see what happens. I find it seems to give me more info when I run it directly, too (or maybe the presentation just changes - in any event I find it easy to check the output from running it individually).

I’m running the tests (via ninja) on a FreeBSD to check if it fails any of the same tests that I’m seeing fail on Ubuntu 12.04 x86_64, just as a cross check on those failures that I’m seeing on my main dev system.

I found with building llvm/lldb, switching your default linker to gold instead of ld helps too.

Todd Fiala wrote:

Thanks, Richard.

I spent some time trying that out this morning. For Ubuntu, I was able to do this with ‘sudo apt-get install binutils-gold’. This improved my lldb build time with cmake/ninja by between 6 and 7%. Thanks for the suggestion!

-Todd

Update: fwiw - I am getting totally inaccurate results when using the gold debugger + cmake/ninja. I’ll have more to say later, but for sure something is not building correctly. I’ll have to isolate what is causing it.

lldb-gdbserver built via cmake/ninja/gold linker was seg faulting, and neither lldb nor gdb could tell me anything useful about the top few frames of the core. Building the same code with configure/(g)make gave me different results (I got the assert failure printed), and the backtrace looked quite a bit different.

So - until I figure out which piece of the puzzle isn’t working, I don’t think I’d recommend Ubuntu ninja/cmake/gold linker until I figure out which piece (if only one) that is acting funny.

Hi Todd,

Do you see problems w/ terminating LLDB while the debuggee is stopped
(at a breakpoint, say)? I observed problems on FreeBSD with both
"detach" and "quit", described in
http://llvm.org/bugs/show_bug.cgi?id=18894. In both cases the debugee
ended up aborting due to a SIGTRAP, and in the case of quitting, LLDB
took 3-4 seconds extra to exit.

I fixed the detach case in r201724 -- prior to that change breakpoints
were being left in the debuggee after detach.

The quit case seems to be related, and it seems Process*::DoDestroy is
somewhat incomplete (and has been since its introduction). I still
need to figure out exactly what's needed, but for now I just added a
call in ProcessPOSIX::DoDetach() to detach the ptrace monitor:

+ error = m_monitor->Detach(GetID());

and LLDB exits immediately after a quit, I have no more debugees
aborting, and the ninja check-lldb runs in about 3 or 4 minutes
instead of the ~half hour it was before.

-Ed

Ah thanks for the details, Ed!

I’m going to clear up some other bits I guess I left dangling in this thread before:

  1. I did fix the bug that looked like it was a gold linker issue. It was really a general elf core file memory region handling bug that the gold linker just happened to expose. That’s in top of tree as of maybe a month ago.

  2. Something changed in running ‘ninja check-lldb’ sometime after I wrote that above. It went back to on par with configure/make test times. I think I forgot to mention that. (Side note: Steve Pucci’s LLDB_TEST_THREADS change works with ‘ninja check-lldb’ - the only side effect I see is that you can no longer count on order of output between lines as they are interleaved, although whole lines are fine).

  3. I’m back to using Ninja+cmake+gold linker as my preferred local dev build due to the faster turn-around on build times.

Do you see problems w/ terminating LLDB while the debuggee is stopped
(at a breakpoint, say)?

Checking now on Ubuntu 12.04 x86_64:

  1. start lldb, have lldb start process, set breakpoint, run to breakpoint, quit lldb and have process quit: exits fine.

  2. start lldb, have lldb start process, set breakpoint, run to breakpoint, detach from process, quit lldb: exits fine, but didn’t see the output on the (shared) stdout channel between lldb and the process I just detached from. That might be fine, depends on the semantics around whether the detached process can write to the lldb terminal that it previously shared. (I don’t know the answer to that, although intuitively I expected to see the detached process write to the lldb terminal as it would have had it still be attached).

I didn’t see any slow-down in quitting in #1 above. #2 was also fast (but nothing attached). Both looked instantaneous.

Sounds very likely that your ProcessPOSIX change addressed the issues. Nice job, Ed! ProcessLinux is nearly entirely ProcessPOSIX, the big diff being a different ProcessMonitor and a few minor bits of functionality that differ from ProcessFreeBSD.

-Todd

Sounds very likely that your ProcessPOSIX change addressed the issues. Nice job, Ed! ProcessLinux is nearly entirely ProcessPOSIX, the big diff being a different ProcessMonitor and a few minor bits of functionality that differ from ProcessFreeBSD.

Hi Todd - I'm now using this change on FreeBSD
(https://github.com/emaste/lldb/commit/e64649d8cd1a171086ace48e2beb587acd0c82e0)

     if (!HasExited())
     {
- // Drive the exit event to completion (do not keep the inferior in
- // limbo).
+ assert (m_monitor);
         m_exit_now = true;

That looks fine. I think if we find cases where m_monitor is already killed when we hit that (or was never established), we can track it down.

I think the !HasExited() guard will likely already cover everything we’d care about.

Thanks for checking!