Following the Benchmarking BOF from 2013 US dev meeting, I’d like to propose some improvements to the LNT performance tracking software.
The most significant issue with current implementation is that the report is filled with extremely noisy values. Hence it is hard to notice performance improvements or regressions.
After investigation of LNT and the LLVM test suite, I propose following methods. I've also attached prototype patches for each method.
- Increase the execution time of the benchmark so it runs long enough to avoid noisy results
Currently there are two options to run benchmarks, namely small and large problem size. I propose adding a third option: adaptive. In adaptive mode, benchmarks scale the problem size according to pre-measured system performance value so that the running time is kept at around 10 seconds, the sweet spot between time and accuracy. The downside is that correctness for some benchmarks cannot be measured. Solution is to measure correctness in a separate board with small problem size.
LNT: [PATCH 2/3] Add options to run test-suite in adaptive mode
Test suite: [PATCH 1/2] Add support for adaptive problem size
[PATCH 2/2] A subset of test suite programs modified for adaptive
- Show and graph total compile time
There is no obvious way to scale up the compile time of individual benchmarks, so total time is the best thing we can do to minimize error.
LNT: [PATCH 1/3] Add Total to run view and graph plot
- Only show performance changes with high confidence in summary report
To investigate the correlation between program run time and its variance, I ran Dhrystone of different problem size multiple times. The result shows that some fluctuations are expected and shorter tests have much greater variance. By modelling the run time to be normally distributed, we can calculate the minimal difference for statistical significance. Using this knowledge, we can hide those results with low confidence level from summary report. They are still available and marked in colour in detailed report in case interested.
LNT: [PATCH 3/3] Ignore tests with very short run time
- Make sure board has low background noise
Perform a system performance benchmark before each run and compare the value with the reference(obtained during machine set-up). If the percentage difference is too large, abort or defer the run. In prototype this feature is implemented using Bash script and not integrated into LNT. Will rewrite in Python.
In my prototype implementation, the summary report becomes much more useful. There are almost no noisy readings while small regressions are still detectable for long running benchmark programs. The implementation is backwards compatible with older databases.
Screenshots from a sample run is attached.
Thanks for reading!
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
patchset.tar.gz (14.4 KB)