[RFC] Performance tracking and benchmarking infrastructure BoF


Next week at the developers meeting, I'm chairing a BoF session on improving
our performance tracking and benchmarking infrastructure. I'd like to make
most out of the 45 minute slot. Therefore, I'd like to start the discussion
bit earlier here, giving everyone who can't come to the BoF a chance to put
their 2 cents. At the same time, I hope this will also give me a chance to
collect and structure ideas and opinions resulting in a better starting
for the discussion during the BoF session.

The main motivation for organizing this BoF is my impression that it's
to collaborate on patches that improve the quality of generated code,
to collaborating on patches that fix bugs. I think that with enhancements to
the buildbot infrastructure, it should be possible to make it easier to
collaborate on performance-enhancing-patches. I'd like to discuss here, and
during the BoF session, what the enhancements are that we'd need the most,
we need them, and what the main expected difficulties are that we'd need to
overcome to get these enhancements implemented.

To kick off the discussion, let me propose a number of enhancements in
functionality that I think would enable easier collaboration on
performance-enhancing patches the most. I'd very much welcome feedback and
more ideas:

* Early and automated detection of significant performance regressions, with
  low rate of false positives.

  Rationale: The buildbots currently do a great job at catching accidental
  correctness regressions in an automated fashion with a reasonably low
  positive rate. It'd be great if they would also catch significant
  regressions automatically with a reasonable false positive rate.

* Having common public performance data, before committing a patch, enabling
  everyone to review and evaluate the positive and negative effects of
  optimization patches.

  Rationale: Currently, very little performance data is typically provided
  a patch is put up for review. Having a reasonable set of performance
  would make it easier for reviewers to evaluate the value of a patch.

* Make it possible to evaluate the performance impact of a patch on
  architectures or platforms that the developer doesn't have access to
  committing the patch.

  Rationale: Most developers probably do not have access to all
  or platforms that the community as a whole cares about. Being able to
  that a patch doesn't regress performance on other platforms is probably as
  useful as testing basic correctness on platforms a developer doesn't have
  access to. The regression tests provide the functionality to check that
  are no serious correctness regressions on other platforms. A way to verify
  that performance isn't negatively affected on platforms a developer
  have access to would be nice. One way to achieve it would be to allow a
  top-of-trunk+patch build to be run on all benchmarks in the test-suite, on
  all boards reserved for benchmarking in the buildbot setup.

I'm sure that to get the above functionality implemented, quite a few
and non-technical issues have to be resolved. In the interest of keeping
email a bit more focussed, I've decided to not yet mention the issues I'm
expecting, but just the functional enhancements that I think are the most