Status of use-diet so far (NO API CHANGES)

Hi all,

in the last days I was busy gathering performance data
about the "class Use"-related changes.

I have nice measurements on a 8Gig MacPro with kimwitu++.
This is important to say, because this machine is
in plenty of memory, so swapping is not likely, which
means that in more constrained setups (when swapping
occurs) the use-diet approach is probably producing
even better results.

So here are my values when doing a
  cd ~/test-suite/MultiSource/Applications/kimwitu++
  time make --always
in the regular trunk and with my changes merged in:

grep "^real" < kimwituRegular.scatter.backup2 | sort
real 1m34.002s
real 1m34.084s
real 1m34.092s
real 1m34.398s
real 1m34.468s
real 1m34.508s
real 1m34.733s
real 1m34.849s
real 1m35.057s
real 1m35.109s
real 1m35.160s
real 1m35.236s
real 1m36.005s
real 1m36.667s
real 1m38.071s
real 1m38.202s
real 1m38.500s
real 1m41.267s
real 1m41.868s
real 1m43.603s

grep "^real" < kimwituDiet.scatter.backup2 | sort
real 1m33.920s
real 1m33.991s
real 1m33.997s
real 1m34.027s
real 1m34.083s
real 1m34.109s
real 1m34.235s
real 1m34.255s
real 1m34.375s
real 1m34.431s
real 1m34.440s
real 1m34.585s
real 1m34.839s
real 1m35.481s
real 1m37.998s
real 1m38.184s
real 1m38.653s
real 1m38.906s
real 1m43.415s
real 1m43.490s

As you can see, the use-diet changes actually lower the build time
of kimwitu++! (this is as of yesterday's r50182).
Parity is not only reached, but surpassed. I am pretty happy
with this :slight_smile:

The second thing I wanted to report, that contrary to my previous
announcement there will be *no* API change needed, so there is no
need for conversions in other projects any more.
(The confusion came from my erroneous understanding, at some point,
that the defining value for a GlobalVariable cannot be changed and
must be present on construction.)

Bill asked for memory statistics, I shall get some as I get Shark
running.

If I find the time I can even make some pretty graphs.

All in all, I am on track with merging the branch on trunk
by the end of this week.

Cheers,

  Gabor

Thanks for these numbers. Do you know how much of this increase is due to
co-allocating Use arrays with their users, and how much is due to the
actual shrinking of the size of Use?

Using less memory is great, though the approach used by use-diet to
eliminate the User field makes the code significantly more complicated,
so I'm looking forward to some nice comforting data on what the
savings is :-).

Thanks,

Dan

> As you can see, the use-diet changes actually lower the build time
> of kimwitu++! (this is as of yesterday's r50182).
> Parity is not only reached, but surpassed.

Thanks for these numbers. Do you know how much of this increase is due
to
co-allocating Use arrays with their users, and how much is due to the
actual shrinking of the size of Use?

Hi Dan!

I cannot give you the results of extensive research, but some strong
indication.

I ran opt without optimization passes on the raw linked bitcode of
sqlite3, with and without my patches:

ggreif$ time ~/llvm/Release/binDiet/opt -disable-opt
sqlite3.linked.rbc -o sqlite3.linked.bc -f

real 0m0.556s
user 0m0.515s
sys 0m0.037s

ggreif$ time ~/llvm/Release/binReg/opt -disable-opt sqlite3.linked.rbc
-o sqlite3.linked.bc -f

real 0m0.564s
user 0m0.521s
sys 0m0.041s

There is a significant speedup for my patch:
expr 515000 / 521
988

For real times:
expr 556000 / 564
985

This is 1.5% speedup. Of course this already contains the size changes
from 16->12 bytes, so we may have the extra effect (less bytes
allocated) factored in.

When running standard optmizations, there is a 5% slowdown on this
test, which I have chosen because it is the most extreme example. The
majority of the opt runs get a 2% penalty. Use::getUser() seems to
claim 1.8% of the samples in sqlite3 under shark with -std-compile-
opts.

When not restricting ourselves to opt, but simply building the sqlite3
application, the times become much more reasonable:

grep "^real" < sqlite3Diet.scatter.backup2 | sort
real 1m35.643s
real 1m35.924s
real 1m35.941s
real 1m36.033s
real 1m36.140s
real 1m36.151s
real 1m36.198s
real 1m36.249s
real 1m36.309s
real 1m36.489s

grep "^real" < sqlite3Regular.scatter.backup2 | sort
real 1m35.417s
real 1m35.459s
real 1m35.497s
real 1m35.593s
real 1m35.632s
real 1m35.694s
real 1m35.703s
real 1m35.928s
real 1m35.986s
real 1m36.057s

expr 96489000 / 96057
1004

0.4% when looking at the 10th values.

I explain this with the fact that opt only runs once (paying the
penalty), but
the other llvm-tools all reap benefits.

On a side-note kimwitu++ spent 1.2% of its shark samples in getUser,
accordingly the opt penalty is around 2%:

normal.table:MultiSource/Applications/kimwitu++/kc 6.4902 3564968
8.1354
use-diet.table:MultiSource/Applications/kimwitu++/kc 6.6518 3564968
8.1064
ggreif$ expr 66518000 / 64902
1024

Using less memory is great, though the approach used by use-diet to
eliminate the User field makes the code significantly more complicated,
so I'm looking forward to some nice comforting data on what the
savings is :-).

Yes, Owen kindly provided shark Malloc statistics of the dealII SPEC
test. It has shown 13% of memory savings.

Cheers,

   Gabor