I've implemented a test-suite patch and an LNT patch to calculate a hash
for each binary in the test-suite & to store it in the LNT database.
The test-suite patch is surprisingly simple. The only thing I had to do
to get stable hashes is to strip out the .comment and all .note sections.
The attached spreadsheet shows the calculated hashes by the patch across
the test-suite for a range of LLVM svn revisions from last week, each
roughly a day apart from each other. It does show indeed that on about
half of the days the binaries didn't change. The hashes were collected
on a linux-x86_64 system.
The attached lnt patch is quite a bit bigger - adding a new type of
sample field (hash) and adapting the rest of LNT to make LNT's regression
tests pass. I didn't attempt to make use of the hash values in any of
LNT's analyses or reports in this patch. I've got a vague idea that maybe
the first easy & useful additions could be to color-code the background
in the run-chart with the hash-value of the binary. That way, you could
see which sample points were produced by identical binaries. The same
could be done for the spark lines on the daily report page.
Bottom line: at least on linux platforms, it seems that it's pretty
to compute useful hashes from binaries pretty easily, see the attached
test-suite patch. I'm assuming that on Darwin platforms the exact same
patch - or maybe with some tweaks on which sections to strip - should
work too, but don't know enough about Darwin to know for sure.
The LNT changes are indeed more invasive. I've attached my current version
of the patch I've got for that.
What do you think of this approach?
0001-Add-support-for-storing-hash-of-test-binaries.patch (59.1 KB)
test-suite-hash-binaries.patch (2.04 KB)
test-suite_hash_comparisons.xlsx (294 KB)
This is a big patch, it might take me a while to review it.
Is there a way to avoid running the perf test for binaries that haven’t changed? I guess that it might be useful for a bit of redundancy, but for doing the analysis I was doing, which involved bisecting back through history to pinpoint at which revisions the hashes changed, it would be useful to avoid wasting time benchmarking programs known to be the same binary (if that matters, then there is a bug in how the perf is being measured, or it is an unrelated system problem which, while it might be interesting to dive into, may not be the focus).
It’s interesting that you had to strip out the .comment and .note. I didn’t have to do that on mac. Do you know if there is a linker flag or compiler flag on linux that we can use to avoid outputting them in the first place?
– Sean Silva
Not running the LNT perf tests for binaries that haven’t changed: I don’t think there currently is a way to do that. If someone added that, I guess the most complex part will be implementing the different format(s) in which to communicate to LNT what the hash of the previous version of the tests are. There is already logic to do the build and run step separately – search for “config.build_threads” in lnt/tests/nt.py. There is also already logic to only run sub-parts of the test-suite. The logic that needs adding is filtering the tests to run based on comparing the hash values from the build step with wherever the predefined uninteresting hash values come from.
I don’t know of compiler or linker flags to not produce or remove those .comment and .note sections. It’s easy to strip them out before producing a hash, so I think stripping them is better than requiring LNT to inject the necessary command line options for all possible compilers and linkers in use.
Sure - no problem.
To make the review easier, I've uploaded it into phabricator at