[RFC] 3-bit Waymarking

Hi devs,

after my intentionally "playful" EuroLLVM presentation (*) I think it
would be time to get serious about merging to ToT. But we should
probably find out whether an optimized algorithm is desired at all.

So I'd solicit comments from the code owners (Use.{h,cpp}) and anybody
who is interested. For closer scrutiny, the code is here:
<http://llvm.org/viewvc/llvm-project/llvm/branches/ggreif/waymark-64-new/>

I do not have the equipment to perform a compile-time measurement. How
do folks benchmark for this nowadays? Is it a viable alternative to
bring the changes to ToT and compare speedups/slowdowns in the nightly
builds retrospectively?

Thanks for any input,

cheers,

    Gabor

(*) Some even say "bizarre"
(https://twitter.com/alexr/statuses/453486315083157504) :
<http://llvm.org/devmtg/2014-04/PDFs/LightningTalks/waymark.pdf>

I saw the slides, it looks very interesting. Have you actually measured any memory wins from this?

-Chris

Hi devs,

after my intentionally "playful" EuroLLVM presentation (*) I think it
would be time to get serious about merging to ToT. But we should
probably find out whether an optimized algorithm is desired at all.

So I'd solicit comments from the code owners (Use.{h,cpp}) and anybody
who is interested. For closer scrutiny, the code is here:
<http://llvm.org/viewvc/llvm-project/llvm/branches/ggreif/waymark-64-new/>

I do not have the equipment to perform a compile-time measurement. How
do folks benchmark for this nowadays? Is it a viable alternative to
bring the changes to ToT and compare speedups/slowdowns in the nightly
builds retrospectively?

I saw the slides, it looks very interesting. Have you actually measured any
memory wins from this?

Hi Chris,

there are no memory savings, Use has still 3 pointers (the 4->3
reduction happened back in 2008). What should be faster with the new
algorithm are the "value_use_iterator::operator->" operators (i.e.
finding all Users of a Value).

Cheers,

    Gabor

Ah, ok. If you're looking for performance benchmarks, you could look at LTO time of something large, in that a majority of the time is spent in the mid-level optimizer.

-Chris

I forgot to Cc this (and some other unimportant replies) to the list.

Jay.

Just to stimulate discussion, here's another playful/bizarre idea for
improving the current 2-bit waymarking scheme.

Use this encoding:

00 = s (stop)
01 = 1
10 = 2
11 = 3

Change the order of fields in a Use to be prev, next, value. I think we can
guarantee that the first word of a User has 00 in its low order bits, so if
you try to read just beyond the end of the Use array, it'll look like a
stop tag.

Then the sequence goes like this, where I've used a colon to separate the
real tags (in Uses) from the fake tag (in the first word of the User):

encoding: ...s121s112s32s22s12s3s1s:s
accesses: ...5876576546546545434332:

To decode this, skip digits until you reach a stop tag, and then read all
the digits up to the next stop tag. The digits tell you how far to skip
from the terminating stop tag to the start of the User, using a "3-adic
numeration system" (http://en.wikipedia.org/wiki/Bijective_numeration):

0 = (empty string)
1 = 1
2 = 2
3 = 3
4 = 11
5 = 12
6 = 13
7 = 21
...
13 = 111
14 = 112
15 = 113
16 = 121
...

This is asymptotically better than the current system because it's (more or
less) using base 3 instead of base 2 to encode the skip distances. However,
it's probably not very good in practice, because it needs more accesses
when there are only one or two Uses, which is probably the common case. Oh
well.

Jay.

In case you're looking for examples of "something large", clang
is a good option as long as you have enough RAM not to swap.