[PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash

flyingmutant · February 6, 2010, 3:09pm

Some additional info can be found at:

  http://murmurhash.googlepages.com/
  MurmurHash - Wikipedia
  Hash Functions: An Empirical Comparison - CodeProject

as well as in the patch description itself. Patch and benchmark attached.

Gregory

0001-FoldingSetNodeID-use-MurmurHash2-instead-of-SuperFas.patch (4.1 KB)

Chandler_Carruth · February 7, 2010, 12:51am

While I've not reviewed the patch in too much detail, it looks
promising. Can you run some end-to-end benchmarks to make sure that
cache pressure in the full program or other variables not accounted
for in a micro-benchmark don't dominate performance? Specifically the
nightly tester includes a number of real programs and machinery to
measure total compile time.

flyingmutant · February 7, 2010, 9:03pm

While I've not reviewed the patch in too much detail, it looks
promising. Can you run some end-to-end benchmarks to make sure that
cache pressure in the full program or other variables not accounted
for in a micro-benchmark don't dominate performance? Specifically the
nightly tester includes a number of real programs and machinery to
measure total compile time.

Ok, now with some kinda-hard numbers!

          murmurhash2 | superfasthash
                                     >
- 6.6404 seconds (6.6697 wall clock) | 6.6204 seconds (6.8557 wall clock)
+ 2.6722 seconds (2.7064 wall clock) | 2.7962 seconds (2.7502 wall clock)
+ 8.6725 seconds (8.6662 wall clock) | 8.7526 seconds (8.7162 wall clock)
+ 2.7362 seconds (2.7729 wall clock) | 2.8242 seconds (2.8146 wall clock)
+ 1.4281 seconds (1.4068 wall clock) | 1.4761 seconds (1.4198 wall clock)
+ 4.3163 seconds (4.3392 wall clock) | 4.3683 seconds (4.3969 wall clock)
                                     >
+ 6.6804 seconds (6.6916 wall clock) | 6.7804 seconds (6.6950 wall clock)
+ 2.7682 seconds (2.7342 wall clock) | 2.7802 seconds (2.7321 wall clock)
- 8.7365 seconds (8.9505 wall clock) | 8.6965 seconds (8.7124 wall clock)
+ 2.7962 seconds (2.7812 wall clock) | 2.8282 seconds (2.8049 wall clock)
+ 1.3881 seconds (1.4103 wall clock) | 1.4001 seconds (1.4103 wall clock)
- 4.3843 seconds (4.3509 wall clock) | 4.3723 seconds (4.3714 wall clock)
                                     >
+ 6.5724 seconds (6.7971 wall clock) | 6.6684 seconds (6.6465 wall clock)
+ 2.6401 seconds (2.7378 wall clock) | 2.6682 seconds (2.7965 wall clock)
+ 8.7005 seconds (8.7068 wall clock) | 8.7766 seconds (8.8452 wall clock)
+ 2.7282 seconds (2.7680 wall clock) | 2.7762 seconds (2.7904 wall clock)
- 1.4561 seconds (1.4230 wall clock) | 1.4201 seconds (1.4184 wall clock)
- 4.4523 seconds (4.3754 wall clock) | 4.4003 seconds (4.3706 wall clock)

This table was obtaind by running 3 times 'make TEST=nightly report.html -j2
&& l Output/ | grep info | xargs cat | grep 'Total Execution Time' >> timings
&& make clean' or something like that, on MultiSource/Applications/ClamAV.

It looks like FoldingSetNodeID is not the most performance-critical part of
LLVM, but changing the hash algorithm help a bit.

Gregory

To_ro_k_Edwin · February 8, 2010, 6:15pm

+/// This version additionally assumes that 'len % 4 == 0'. Below are the
+/// original notes.
+///
+/// Note - This code makes a few assumptions about how your machine
behaves -
+///
+/// 1. We can read a 4-byte value from any address without crashing

Doesn't it only need the start of the data to be 4-byte aligned?
That seems to be the case already, since this is a smallvector of unsigned.

+/// 2. sizeof(int) == 4

How about using uint32_t/int32_t instead of int then?

Best regards,
--Edwin

Chris_Lattner · February 8, 2010, 6:31pm

Hi Gregory,

These numbers are so noisy, that they aren't particularly useful. Could you try instrumenting foldingset to keep track track of the # collisions and # hash table resizes and compare those? They should be much more stable and still correlate directly to performance.

-Chris

flyingmutant · February 8, 2010, 7:27pm

You are right in both cases -- these are notes (and source) from the original
implementation, I decided to leave them as-is for easier verification of the
hashing algorithm.

Gregory

flyingmutant · February 8, 2010, 7:28pm

Yup -- they are extremely noisy. Very good idea -- will give it a try.

Gregory

flyingmutant · February 11, 2010, 12:49am

OK, now with real numbers

First, the main thing: SuperFastHash appears to be the hash with best
distribution. Use of MurmurHash instead generates 1.28% more collisions while
doing nightly test in MultiSource/Applications.

Second: I've also tested lookup3 hash, and its use generated 0.1% more
collisions, compared to SFH.

These results were a bit surprising for me!

Number of hash table resizes is independent of used hashing algorithm, because
hash table grows when 'nentries > nbuckets * 2'.

Gregory

Chris_Lattner · February 11, 2010, 12:54am

Thanks for doing the evaluation! It sounds like we should stay with SFH. In addition to fewer collisions, it is cheaper to compute.

-Chris

flyingmutant · February 11, 2010, 12:57am

While I agree that staying with SFH makes sense, I should note that in
microbenchmarks, MMH wins by a big margin (see the original patch).

Gregory

flyingmutant · February 11, 2010, 2:22am

And now something really surprising, Bernstein hash

  unsigned bernstein_hash(const unsigned char *str, size_t len)
  {
    unsigned hash = 5381;
    unsigned c;

    while (len--) {
      c = *str++;
      hash = (hash * 33) ^ c;
    }

return hash;
}

gives 1.48% less (sic!) collisions than SFH! That means less than lookup3, MMH
and FNV, too. I have no idea why this simplest hash is so good for LLVM. Maybe
because LLVM hashes lots of pointers?

But BH is ~2 times slower than SFH.

Overall, it looks like
1) Number of collisions for different hashes is within 5% interval.
2) MMH is the fastest by a big margin, FNV / BH are the slowest by a big margin.
3) Any hash will do for FoldingSet.

Gregory

Topic		Replies	Views
We need better hashing LLVM Dev List Archives	29	186	February 19, 2012
The ComputeHash algorithm in FoldingSet is unsafe LLVM Dev List Archives	0	97	September 26, 2011
RFC: change string hash function in libc++ Clang Frontend	21	301	December 6, 2011
Proposed implementation of N3333 hashing interfaces for LLVM (and possible libc++) LLVM Dev List Archives	12	185	March 1, 2012
FoldingSetNodeID operations inefficiency LLVM Dev List Archives	1	62	April 23, 2008

[PATCH] FoldingSetNodeID: use MurmurHash2 instead of SuperFastHash

Related topics