Why not Jenkins' hashing?

The search time of Jenkins' hashing might become unacceptably long

for extraordinary large label sets.

It is generally not linear time, making it hardly appropiate for the

default pass pipeline. That's not just a question of very large label

sets -- it happens even with much smaller sets in the area of 100+

elements.

Why CHM?

CHM is extraordinarily fast at generating the extra table with a

runtime of O(1).

Probalistic O(n), not O(1).

Why not CHM?

CHM needs extra table space of 2.09 * n elements; this is larger

compared to Jenkins' method.

This is misleading. It requires 2 * n elements for the construction

using two independent hash functions, it is ~1.24 * n elments for the

construction using three independent hash functions.

CHM (in the implementation of cmph-2.0) is much more complicated

than the code of Jenkins' hashing:

h1 = f1(x) % N;

h2 = f2(x) % N;

if (h1 == h2 && ++h2 >= N) h2 = 0;

h = (Table[h1] + Table[h2]) % n;

The third line is strange -- that's normally just rejected during graph

construction. As such the normal output is:

table[M] = { ... }

h1 = f1(x) % M;

h2 = f2(x) % M;

return (table[h1] + table[h2]) % N;

with M and N being compile-time constants. As such, the modulo

operations will be replaced by two multiplications. Note that the two

table accesses are independent and parallizable.

The third alternative is BPZ, which would give a hash function like:

h1 = f1(x) % M;

h2 = f2(x) % M;

idx = (g[h1] + g[h2]) % 2

return idx == 0 ? h1 : h2;

The table load is a bit load here. This version would require 2 bit/key

for a perfect hash function. Bit load tends to be somewhat messier

though, so this is less attractive.

Joerg