RFC: Using hashing for switch statements

Hi folks,

here is a short RFC regarding hashing for the implementation of switch statements and my preliminary patch.
I posted this patch on 2014-01-16 21:58 at llvm-commits@cs.uiuc.edu. You can find a copy e.g. on http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140113/201782.html.

Best regards
Jasper

Why not Jenkins' hashing?

The search time of Jenkins' hashing might become unacceptably long
for extraordinary large label sets.

It is generally not linear time, making it hardly appropiate for the
default pass pipeline. That's not just a question of very large label
sets -- it happens even with much smaller sets in the area of 100+
elements.

Why CHM?

CHM is extraordinarily fast at generating the extra table with a
runtime of O(1).

Probalistic O(n), not O(1).

Why not CHM?

CHM needs extra table space of 2.09 * n elements; this is larger
compared to Jenkins' method.

This is misleading. It requires 2 * n elements for the construction
using two independent hash functions, it is ~1.24 * n elments for the
construction using three independent hash functions.

CHM (in the implementation of cmph-2.0) is much more complicated
than the code of Jenkins' hashing:
h1 = f1(x) % N;
h2 = f2(x) % N;
if (h1 == h2 && ++h2 >= N) h2 = 0;
h = (Table[h1] + Table[h2]) % n;

The third line is strange -- that's normally just rejected during graph
construction. As such the normal output is:

table[M] = { ... }
h1 = f1(x) % M;
h2 = f2(x) % M;

return (table[h1] + table[h2]) % N;

with M and N being compile-time constants. As such, the modulo
operations will be replaced by two multiplications. Note that the two
table accesses are independent and parallizable.

The third alternative is BPZ, which would give a hash function like:
h1 = f1(x) % M;
h2 = f2(x) % M;
idx = (g[h1] + g[h2]) % 2
return idx == 0 ? h1 : h2;

The table load is a bit load here. This version would require 2 bit/key
for a perfect hash function. Bit load tends to be somewhat messier
though, so this is less attractive.

Joerg

Hi Jasper,

As it was stated before - all these are pretty fine theoretical
arguments. However, without real numbers of benchmarking it's hard to
say anything definite about whether this approach is preferable.

There are plenty of switches in LLVM and clang itself. Can't you
provide some statistics how the stuff was lowered before and after?

Hi Jasper,

As it was stated before - all these are pretty fine theoretical
arguments. However, without real numbers of benchmarking it's hard to
say anything definite about whether this approach is preferable.

There are plenty of switches in LLVM and clang itself. Can't you
provide some statistics how the stuff was lowered before and after?

Sounds like it's worth implementing in tree, off by default, and then doing
some benchmarking.

FWIW, I don't think sparse switches are very common in LLVM, so it's not
clear how often this optimization would fire.