LLD performance w.r.t. local symbols (and --build-id)

Hi,

Rafael took some measurements to try to investigate the effect of the local symbols changes.
I’ve been taking a look at the measurements he got and there were some interesting things we noticed.

For starters, in the range of revisions tested (r263214 through r263471), we found that the commit for --build-id was the most noticeable, with slowdowns from 7% to 23% (note: these were non-debug builds).

This is somewhat concerning because it appears that this option is passed to the linker by default on Linux. Any performance comparisons with other linkers before LLD started respecting --build-id must be taken with a grain of salt unless they were controlled for --build-id.

Returning to the original motivation for the investigation (local symbols), we see something interesting. Zooming in r263214 through r263237, we find that the performance characteristics for linking ScyllaDB are substantially different from the others. The reason for this is still unknown.

The following commits showed significant performance changes for ScyllaDB.

r263222 ~4% speedup for ScyllaDB
commit 1ffd121a07a3d67bf52d849c0cdef0f2fad889ba
Author: Rafael Espindola <rafael.espindola@gmail.com>

Some slowdown by --build-id was expected. It scans most parts of the output to compute a hash value after all. It can easily be parallelized, but it would still have non-negligible performance impact. I’d think it is not a good practice for the compiler to pass --build-id option unconditionally. It seems to me that the option was added too casually without fully understanding that the option would slows down linking time significantly.

Slowdown by “[ELF] - Early continue in InputSectionBase::relocate(). NFC.​” looks wierd for me. I do not see any reasons for any impact on perfomance by this change.

Good news is that since it was NFC it can easily be reverted. But I think slowdown in results is unrelative with that change and reverting will not give us 2-3% boost back.

Slowdown by "[ELF] - Early continue in InputSectionBase<ELFT>::relocate().
NFC." looks wierd for me. I do not see any reasons for any impact on
perfomance by this change.

I think it is just because the continue is unlikely and now there is
an early check of offset.

Good news is that since it was NFC it can easily be reverted. But I think
slowdown in results is unrelative with that change and reverting will not
give us 2-3% boost back.

I don't think we should revert it right now. There are a few big
changes I would like to try to the relocation processing code.

Cheers,
Rafael

Agreed. Unless your code changes algorithm, you don't need to worry too
much about performance fluctuations caused by that change. This may vary on
compiler, compiler version, code around your change, and test cases.

​​> Agreed. Unless your code changes algorithm, you don’t need to worry too much about performance fluctuations caused by that change. This may vary on compiler, compiler > version, code around your change, and test cases.

3% is a quite visible difference. I guess that testing was performed using single compiler, hardware and so on. Why that change is observed ? Was it an 3% fluctuation ?

I think it is just because the continue is really uncommon, but I
haven't actually profiled it. That is, I run "perf stat", not "perf
record".

Cheers,
Rafael