New LLD performance builder

George_Rimar · February 16, 2018, 9:55am

Hello everyone,

I have added a new public LLD performance builder at
http://lab.llvm.org:8011/builders/lld-perf-testsuite.
It builds LLVM and LLD by the latest releaed Clang and runs a set of
perfromance tests.

The builder is reliable. Please pay attention on the failures.

The performance statistics are here:
http://lnt.llvm.org/db_default/v4/link/recent_activity

Thanks

Galina

Great news, thanks !

Looking on results I am not sure how to explain them though.

For example r325313 fixes "use after free", it should not give any performance
slowdowns or boosts. Though if I read results right, they show 23.65% slowdown
for time of linking linux kernel (http://lnt.llvm.org/db_default/v4/link/104).

I guess such variation can happen for example if bot do only single link iteration for tests,
so that final time is just a error mostly probably.

task-clock results are available for "linux-kernel" and "llvm-as-fsds" only and all other
tests has blank field. Should it mean there was no noticable difference in results ?

Also, "Graph" and "Matrix" buttons whatever they should do show errors atm.
("Nothing to graph." and "Not Found: Request requires some data arguments.").

Best regards,
George | Developer | Access Softek, Inc

gkistanova · February 16, 2018, 8:27pm

Hello George,

The bot does 10 runs for each of the benchmarks (those dots in the logs are meaningful). It seems the statistics quite stable if you would look over number of revisions.

For example, if one would take a look at the linux-kernel branches - http://lnt.llvm.org/db_default/v4/link/graph?plot.0=1.12.2&highlight_run=104, it gets obvious that the number of branches increased significantly as a result of the r325313. The metric is very stable around the impacted commit.

As the number of branches has increased, the related metrics regress as well, like branch-misses

I’m sure you have checked that, but, just in case, here is the link to the LNT doc.

Besides reporting to the lnt.llvm.org, each build contains in the log all the reported data, so you could process it whatever you want and find helpful.

Thanks

Galina

gkistanova · February 16, 2018, 9:30pm

Hello George,

Sorry, somehow hit a send button too soon. Please ignore the previous e-mail.

The bot does 10 runs for each of the benchmarks (those dots in the logs are meaningful). We can increase the number of runs if proven that this would significantly increase the accuracy. I didn’t see the increase in accuracy when have been staging the bot, which would justify the extra time and larger gaps between the tested revisions. 10 runs seems give a good balance. But I’m open for suggestions.

It seems the statistics are quite stable if you would look over number of revisions.

And in this particular case the picture seems quite clear.

At http://lnt.llvm.org/db_default/v4/link/104, the list of Performance Regressions suggests that the most hit was with the linux-kernel. The regressed metrics - branches, branch-misses, instructions, cycles, seconds-elapsed, task-clock. Some other benchmarks shows regressions in branches and branch-misses, some shows improvements.

The metrics are consistent before and after the commit, so, I do not think this one is an outliner.
For example, if one would take a look at the linux-kernel branches - http://lnt.llvm.org/db_default/v4/link/graph?plot.0=1.12.2&highlight_run=104, it gets obvious that the number of branches increased significantly as a result of the r325313. The metric is very stable around the impacted commit and does not go down after. The branch-misses is more volatile, but still consistently shows the regression as the result of this commit.

Now someone should see why this particular commit has resulted in significant increase of branching with the Linux Kernel.

As of how to use LNT web UI, I’m sure you have checked that, but, just in case, here is the link to the LNT doc - http://llvm.org/docs/lnt/contents.html.

task-clock results are available for “linux-kernel” and “llvm-as-fsds” only and all other
tests has blank field. Should it mean there was no noticable difference in results ?

If you would go to http://lnt.llvm.org/db_default/v4/link/104#task-clock (or go to http://lnt.llvm.org/db_default/v4/link/104 and select the task-clock on the left, which is the same), you would see the list of actual values in the “Current” column. All of them populated, none is blank. The column “%” contains the difference from the previous run in percents, or dash for no measured difference.

Also, “Graph” and “Matrix” buttons whatever they should do show errors atm.

I guess you didn’t select what to graph or what to show as a matrix, did you?

Besides reporting to the lnt.llvm.org, each build contains in the log all the reported data, so you could process it whatever you want and find helpful.

Hope this helps.

Thanks

Galina

George_Rimar · February 17, 2018, 10:57am

Thanks for information, Galina !

It was really helpfull for me.

task-clock results are available for “linux-kernel” and “llvm-as-fsds” only and all other

tests has blank field. Should it mean there was no noticable difference in results ?

If you would go to http://lnt.llvm.org/db_default/v4/link/104#task-clock (or go to http://lnt.llvm.org/db_default/v4/link/104 and select
the task-clock on the left, which is the same), you would see the list of actual values in the “Current” column. All of them populated, none is blank. The column “%” contains
the difference from the previous run in percents, or dash for no measured difference.

Yes, I meant exactly that. I see dashes in “%” columns for most of the tests.
Sorry for my wording inaccuracy that caused this confusion

Also, “Graph” and “Matrix” buttons whatever they should do show errors atm.

I guess you didn’t select what to graph or what to show as a matrix, did you?

Right, I did’t know I should. Now I see how it works.

So, great to see that such a new tool is already able to reveal interesting results. Thanks !

George.

Rafael_Avila_de_Espi · February 22, 2018, 9:56pm

Thanks a lot for setting this up!

By using the "mean as aggregation" option one can see the noise in the
results better:

http://lnt.llvm.org/db_default/v4/link/graph?switch_min_mean=yes&moving_window_size=10&plot.9=1.9.7&submit=Update

There are a few benchmarknig tips in https://www.llvm.org/docs/Benchmarking.html.

For example, from looking at

http://lab.llvm.org:8011/builders/lld-perf-testsuite/builds/285/steps/cmake-configure/logs/stdio

It seems the produced lld binary is not being statically linked.

A tip to make the bot a bit faster is that it could run "ninja bin/lld"
instead of just "ninja":

http://lab.llvm.org:8011/builders/lld-perf-testsuite/builds/285/steps/build-unified-tree/logs/stdio

Is lld-speed-test in a tmpfs?

Is lld-benchmark.py a copy of lld/utils/benchmark.py?

Thanks,
Rafael

Galina Kistanova via llvm-dev <llvm-dev@lists.llvm.org> writes:

gkistanova · February 26, 2018, 8:46pm

Hello Rafael,

It seems the produced lld binary is not being statically linked.

Hm. It should. But it seems couple config params are missing. Fixed. Thanks for catching this!

Is lld-speed-test in a tmpfs?

Correct.
All the benchmarking tips from https://www.llvm.org/docs/Benchmarking.html have been applied to that bot.

Is lld-benchmark.py a copy of lld/utils/benchmark.py?

Correct. Modulo few local changes for more verbose printing to be “bot” friendly. Didn’t decided yet if this is something we want in lld/utils/benchmark.py.

Thanks

Galina

Rafael_Avila_de_Espi · February 26, 2018, 10:17pm

Galina Kistanova <gkistanova@gmail.com> writes:

Hello Rafael,

It seems the produced lld binary is not being statically linked.

Hm. It should. But it seems couple config params are missing. Fixed. Thanks
for catching this!

Is lld-speed-test in a tmpfs?

Correct.
All the benchmarking tips from Benchmarking tips — LLVM 18.0.0git documentation
have been applied to that bot.

Is lld-benchmark.py a copy of lld/utils/benchmark.py?

Correct. Modulo few local changes for more verbose printing to be "bot"
friendly. Didn't decided yet if this is something we want in
lld/utils/benchmark.py.

Interesting. Looks like the runs got faster, but they are still
clustered in three or four different groups. Looking at instructions
makes it even more visible:

http://lnt.llvm.org/db_default/v4/link/graph?highlight_run=426&plot.9=1.9.6

Is there anything else running on the machine while the tests are run?

Cheers,
Rafael

gkistanova · February 27, 2018, 8:32pm

Yep. They are still clustered.

Is there anything else running on the machine while the tests are run?

Not much. The usual buildslave stuff - buildbot, ssh server, some light network services, snmp client, but that’s pretty much it. 20 hardware threads are designated for this.

The test runs on designated for tests only 10 CPUs shielded.
There only perf and lld runs. All the obj files for the tests and the linker itself are on tmpfs, so no disk I/O is involved. Swap file is (almost) empty - few MBs is in use.

It might be that different CPUs gets used for different test runs, as the script starts each run from the scratch. Just a guess. I will look in to this closer.

I’ll take the bot off line to see if having perf running the tests multiple times would give a better result. And would try to reduce the number of designated CPUs to see how that would affect the numbers as well.

Thanks

Galina

Rafael_Avila_de_Espi · February 28, 2018, 7:36pm

Galina Kistanova <gkistanova@gmail.com> writes:

Yep. They are still clustered.

Is there anything else running on the machine while the tests are run?

Not much. The usual buildslave stuff - buildbot, ssh server, some light
network services, snmp client, but that's pretty much it. 20 hardware
threads are designated for this.

The test runs on designated for tests only 10 CPUs shielded.
There only perf and lld runs. All the obj files for the tests and the
linker itself are on tmpfs, so no disk I/O is involved. Swap file is
(almost) empty - few MBs is in use.

It might be that different CPUs gets used for different test runs, as the
script starts each run from the scratch. Just a guess. I will look in to
this closer.
I'll take the bot off line to see if having perf running the tests multiple
times would give a better result. And would try to reduce the number of
designated CPUs to see how that would affect the numbers as well.

The HT siblings are disabled, right?

It is probably a good idea to experiment with disabling swap and having
a single cpu in the shield group.

Thanks,
Rafael

gkistanova · February 28, 2018, 10:16pm

The HT siblings are disabled, right?

Correct.

It is probably a good idea to experiment with disabling swap and having
a single cpu in the shield group.

Yep. This is what I’m in the middle of.

So far I see that it seems the scheduler is keep running on shielded cores no matter what.
Even if there is only 1 core in the shield.

Thanks

Galina

gkistanova · March 12, 2018, 6:58pm

Disabling swap and having a single CPU in the shield group didn’t change much, besides cpu-migrations and context-switches, which now are 0 obviously.
That clustering remains the same. It is also stable to the number of runs (I have changed the test to run 20 times in the middle of that range on the right).

http://lnt.llvm.org/db_default/v4/link/graph?highlight_run=426&plot.9=1.9.6

Unless somebody has a good idea what else we should try, it seems we have it as good as it could be with the current approach.

Thanks

Galina

Rafael_Avila_de_Espi · March 26, 2018, 5:47pm

Galina Kistanova <gkistanova@gmail.com> writes:

Disabling swap and having a single CPU in the shield group didn't change
much, besides cpu-migrations and context-switches, which now are 0
obviously.
That clustering remains the same. It is also stable to the number of runs
(I have changed the test to run 20 times in the middle of that range on the
right).

http://lnt.llvm.org/db_default/v4/link/graph?highlight_run=426&plot.9=1.9.6

Unless somebody has a good idea what else we should try, it seems we have
it as good as it could be with the current approach.

I am benchmarking a patch and just remembered one thing: cset will only
enable the shield if run as root, but it will still run the program
outside otherwise.

Is the benchmark being run as root? if not that might explain the large
variations from run to run.

Cheers,
Rafael

gkistanova · March 26, 2018, 8:16pm

Hi Rafael,

Thanks for mentioning this.

It should be running as root, but I’ll double check anyway.

Thanks

Galina

gkistanova · March 27, 2018, 12:36am

Sudo was in place.

But somehow it still had multiple cores in the shield.

Now it must be just 1 core. Should make the difference with variations but slower to test.

Thanks

Galina

Topic		Replies	Views
New LLD performance builder LLVM Dev List Archives	0	53	February 15, 2018
HA: LLD benchmark results for all commits LLVM Dev List Archives	6	75	January 16, 2016
Recent compile time performance regressions LLVM Dev List Archives	9	92	August 12, 2014
Performance Tracking LLVM Dev List Archives	4	79	November 16, 2011
LLD benchmark results for all commits LLVM Dev List Archives	3	54	January 20, 2016

New LLD performance builder

Related Topics