[RFC] Performance tracking and benchmarking infrastructure BoF

Hi,

Next week at the developers meeting, I'm chairing a BoF session on improving
our performance tracking and benchmarking infrastructure. I'd like to make
the
most out of the 45 minute slot. Therefore, I'd like to start the discussion
a
bit earlier here, giving everyone who can't come to the BoF a chance to put
in
their 2 cents. At the same time, I hope this will also give me a chance to
collect and structure ideas and opinions resulting in a better starting
point
for the discussion during the BoF session.

The main motivation for organizing this BoF is my impression that it's
harder
to collaborate on patches that improve the quality of generated code,
compared
to collaborating on patches that fix bugs. I think that with enhancements to
the buildbot infrastructure, it should be possible to make it easier to
collaborate on performance-enhancing-patches. I'd like to discuss here, and
during the BoF session, what the enhancements are that we'd need the most,
why
we need them, and what the main expected difficulties are that we'd need to
overcome to get these enhancements implemented.

To kick off the discussion, let me propose a number of enhancements in
functionality that I think would enable easier collaboration on
performance-enhancing patches the most. I'd very much welcome feedback and
more ideas:

* Early and automated detection of significant performance regressions, with
a
  low rate of false positives.

  Rationale: The buildbots currently do a great job at catching accidental
  correctness regressions in an automated fashion with a reasonably low
false
  positive rate. It'd be great if they would also catch significant
performance
  regressions automatically with a reasonable false positive rate.

* Having common public performance data, before committing a patch, enabling
  everyone to review and evaluate the positive and negative effects of
  optimization patches.

  Rationale: Currently, very little performance data is typically provided
when
  a patch is put up for review. Having a reasonable set of performance
numbers
  would make it easier for reviewers to evaluate the value of a patch.

* Make it possible to evaluate the performance impact of a patch on
  architectures or platforms that the developer doesn't have access to
before
  committing the patch.

  Rationale: Most developers probably do not have access to all
architectures
  or platforms that the community as a whole cares about. Being able to
verify
  that a patch doesn't regress performance on other platforms is probably as
  useful as testing basic correctness on platforms a developer doesn't have
  access to. The regression tests provide the functionality to check that
there
  are no serious correctness regressions on other platforms. A way to verify
  that performance isn't negatively affected on platforms a developer
doesn't
  have access to would be nice. One way to achieve it would be to allow a
  top-of-trunk+patch build to be run on all benchmarks in the test-suite, on
  all boards reserved for benchmarking in the buildbot setup.

I'm sure that to get the above functionality implemented, quite a few
technical
and non-technical issues have to be resolved. In the interest of keeping
this
email a bit more focussed, I've decided to not yet mention the issues I'm
expecting, but just the functional enhancements that I think are the most
useful.

Thanks,

Kristof