Avoidable overhead from threading by default

I find that on Linux glibc x86-64, --threads=8 often gives the peak performance. With more threads ld.lld may become slower, especially with the glibc memory allocator, but not that bad with mimalloc/jemalloc/tcmalloc/snmalloc/etc (lld with different malloc implementations · GitHub).

ld.lld respects sched_getaffinity(Linux) / cpuset_getaffinity (FreeBSD) / std::thread::hardware_concurrency (others), so if you confine the ld.lld process to a few cores, ld.lld will respect it.

Capping config->threadCount to a hard limit makes things too magical, so I don’t feel easy to do it. Guessing a good config->threadCount depending on the input is even more magical and we definitely don’t want to do that.

I think it is better for a build system to do the scheduling work than letting lld be too smart. Spawned as a job action, lld processes don’t know the global information and make the best scheduling.

For normal uses within or outside a build system in the absence of more information, the current strategy is not bad: let compiling jobs use --threads=1 (embarasingly parallel) and let link jobs use all available cores by default (there are typically just one or very few link jobs concurrently). If some users don’t like the default, they can create a ld.lld shell script.