I'm not sure I'll be at the US dev meeting this year, but we had a
performance BoF last year and I think we should have another, at least
to check progress (that has been made) and to plan ahead. I'm sure
Kristof, Tobias and others will be very glad to see it, too.
If memory serves me well (it doesn't), these are the list of things we
agreed on making, and their progress:
1. Performance-specific test-suite: a group of specific benchmarks
that should be tracked with the LNT infrastructure. Hal proposed to
look at this, but other people helped implement it. Last I heard there
was some way of running it but I'm not sure how to do it. I'd love to
have this as a buildbot, though, so we can track its progress.
2. Statistical analysis of the LNT data. A lot of work has been put
into this and I believe it's a lot better. Anton, Yi and others have
been discussing and submitting many patches to make the LNT reporting
infrastructure more stable, less prone to noise and more useful all
round. It's not perfect yet, but a lot better than last year's.
Some other things happened since then that are also worth mentioning...
3. LNT website got really unstable (Internal Server Error every other
day). This is the reason I stopped submitting results to it, since it
would make my bot fail. And because I still don't have a performance
test-suite bot, I don't really care much for the results. But with the
noise reduction, it'd be really interesting to monitor the progress,
even of the full test-suite, but right now, I can't afford to have
random failures. This seriously needs looking into and would be good
to have that as a topic in this BoF.
4. Big Endian results got in, and the infrastructure now is able to
have both "golden standard" results. That's done and working (AFAIK).
5. Renovation of the test/benchmarks. The tests and benchmarks in the
test-suite are getting really old. One good example is the ClamAV
anti-virus, that is not just old, but the results are bogus and
cooked, which doesn't really tell much signal from noise. Other
benchmarks have such short run-time that it's almost pointless. One
needs to go through the things we test/benchmark and make sure they're
valid and meaningful. This is probably similar, but more extensive,
than item 1.
About non-test-suite benchmarking...
I have been running some closed source benchmarks, but since we can't
share any data on it, getting historical relative results is almost
pointless. I don't think we should be worried as a community to run
keep open scores about them. Also, since almost every one is running
them behind closed doors, and fixing the bugs with reduced cases, I
think that's the best deal we can get.
I've also tried a few other benchmarks, like running ImageMagick
libraries, or Phoronix, and I have to say, they're not really that
great at spotting regressions. ImageMagick will take a lot of work to
make it into a meaningful benchmark, and Phoronix is not really ready
to be a compiler benchmark (it only compiles once, with the system
compiler, so you have to heavily hack the scripts). If you're up to
it, maybe you could hack those into a nice package, but it won't be
I know people have done it internally, like I did, but none of these
scripts are ready to be left out in the open, since they're either
very ugly (like mine) or contain private information...
Hope that helps...