Performance degradation

I recently was going to update the performance timing page on the web
site to reflect the improvements in IRgen time and noticed that there
has been a substantial decrease in performance in other places since I
put the page up about 3 weeks ago. This encouraged me to start
building parts of what I hope will eventually become a general
infrastructure for monitoring performance over time.

One preliminary sample of the results is here:
  http://t1.minormatter.com/~ddunbar/utime-syntax-rev.pdf
This shows the user time to run -fsyntax-only over the Sketch
Objective-C app. The X axis is svn revision and the Y axis is time.
The red line tracks the minimum user time over all samples per
revision. The green line tracks the mean of the bottom 2/3 of samples;
this gives an indication of the reliability of the timing while still
discarding outliers. The number of samples per revision is adaptive,
and the resolution near TOT is low at the moment.

The high bit is that although I gave our lexer a nudge back in the
right direction, clang at -fsyntax-only has still slowed down
recently. In my timings for TOT versus r57900:
(1) clang -parse-noop of Cocoa.h has slowed down by ~4.1%.
(2) clang -fsyntax-only of Cocoa.h has slowed down by ~6.3%.

Some of this is expected, as we gain features and improve correctness,
but I imagine there are a number of other places where the performance
loss is unexpected and can be recovered.

This email is meant just to serve as a "heads up". I will hopefully
get some more clues about where other regressions occurred as more
data comes in and will fix / file / ask for help as appropriate.

- Daniel

I saw an interesting idea in perf test automation on the Chromium blog

http://blog.chromium.org/2008/11/putting-it-to-test.html

Adding a reference build to discount variation in the test conditions. Maybe you can add one for clang test too (if you think it’s worth).

Thanks for the link, I hadn't seen that yet. In fact I do have a
reference type run in a way because I also collect data on the stable
gcc version, and in a fully automated system I would have a few more
reference runs intended to check the current base performance of the
system (i.e., general performance tests not tests with a reference
version). However I do not intend to add a reference run in the same
vein as they are doing with chromium. This inflates the testing times
significantly and has other problems with maintenance.

Conceptually I think a better approach is to (a) try hard to gather
consistent data and (b) be able to estimate the reliability of the
data (this is not the same as having a reference run). In the end I
*want* consistent data; it is better to spend the extra testing time
getting accurate samples or getting more samples so that you end up
with consistent data (IMHO, of course).

The Web 2.0 style viewing infrastructure, on the other hand, is
something I would love to steal. :slight_smile:

- Daniel