Avoidable overhead from threading by default

mjguzik · March 30, 2023, 7:05pm

I find that on Linux glibc x86-64, --threads=8 often gives the peak performance.

Then do min(8, taskset count), see below

With more threads ld.lld may become slower, especially with the glibc memory allocator, but not that bad with mimalloc/jemalloc/tcmalloc/snmalloc/etc

In my test above the primary bottleneck was the kernel (on both systems). In particular Linux carries massive technical debt where there is no dedicated process abstraction – everything is a task_struct with linkage protected with a global lock (tasklist_lock) and most of the time is spend contending on it (and other locks).

Your own response though strengthens my position: if the program can’t make any sensible use of these threads, why even spawn them? They are actively detrimental to stated goal of improving performance.

ld.lld respects sched_getaffinity(Linux) / cpuset_getaffinity (FreeBSD) / std::thread::hardware_concurrency (others), so if you confine the ld.lld process to a few cores, ld.lld will respect it.

I’m aware, I added the FreeBSD support.

When I was adding it I noted this only damage-controls some of the state and most definitely does not solve the problem.

Consider a sample workload: building FreeBSD from scratch.

there is tons of libraries and binaries to link. no matter what taskset you are going to set over make invocation there is nothing solved here – excess threads keep getting spawned up to said limit by each lld and you can have quite a few running at the same time. This is particularly nasty if you have a high-core box like I do (see samples above). One has to damage control by manually passing --threads=1, which should not be necessary.

You may also notice that lld’s willingness to thread is completely detached from the -j argument to make (or an equivalent).

I think it is better for a build system to do the scheduling work than letting lld be too smart. Spawned as a job action, lld processes don’t know the global information and make the best scheduling.

But the current behavior is literally getting in the way of sensible scheduling.

Normally whatever build system you have assumes that the stuff it spawns is single-threaded or participating in “job server” to regulate the workload.

lld spawning numerous threads comes out of the left field and no knowledge of job servers does not help here, not that I would recommend adding it though.

If people run say make -j 20, they expect about 20 workers churning cpu at the same time tops. By not explicitly using tasksets et al they let the kernel distribute the work as it sees fit. If the box has more than 20 hw threads, lld is really pulling off a number on that expectation.

The very fact that even in your own tests there is a rather low value at which more threads stop providing any benefit is an argument for limiting them in a bigger manner than just the taskset.

All in all the current behavior is not sane by any means.

If you think gauging how many threads to spawn is too “magic”, at least put in a limit which damage-controls the current state.

As suggested previosly – min(8, count from taskset) or whatever passed in by --threads, if used. if someone feels like spawning more than 8, it’s their explicit choice

Topic		Replies	Views
LLD: time to enable --threads by default LLVM Dev List Archives	43	560	November 24, 2016
LLD issue on a massively parallel build machine LLVM Dev List Archives	28	384	April 4, 2020
lld and thread over-subscription LLVM Dev List Archives	10	206	October 3, 2015
Parallelizing loading of shared libraries LLDB	22	267	May 2, 2017
RFC: Revisiting LLD-as-a-library design LLVM Dev List Archives	29	795	June 16, 2021

Avoidable overhead from threading by default

Related topics