[libc++] Initial attempt to make a salted hash

Hi,

Modern programming languages with built-in hash table
support usually has some mechanism to fight against
hash collision attack (DoS, use massive prepared input
with same hash code to slow down the service; the
problem is more severe in C++ since the unordered
containers use seperate chaining instead of open
addressing). For example, Perl uses universal hashing,
Python3 uses salted hash.

Here is an initial attempt to implement a salted hash in
libc++, by using an unpredictable seed in
__murmur2_or_cityhash across the lifetime of a program,
and an simple demonstration -- if you run it multiple
times, the order of the elements within an unordered
container may change.

salted_hash.patch (2.35 KB)

demo.cc (184 Bytes)

Hi,

Modern programming languages with built-in hash table
support usually has some mechanism to fight against
hash collision attack (DoS, use massive prepared input
with same hash code to slow down the service; the
problem is more severe in C++ since the unordered
containers use seperate chaining instead of open
addressing). For example, Perl uses universal hashing,
Python3 uses salted hash.

Here is an initial attempt to implement a salted hash in
libc++, by using an unpredictable seed in
__murmur2_or_cityhash across the lifetime of a program,
and an simple demonstration -- if you run it multiple
times, the order of the elements within an unordered
container may change.

I think we need to be cautious here.

[hash.requirements]/ Table 26 says:

The value returned shall depend only on the argument k. [ Note: Thus all evaluations of the expression h(k) with the same value for k yield the same result. — end note ]

Now it isn't clear, at least to me, that if this is talking about all evaluation within the process, or across different invocations of the same process, or even across invocations of different processes. Without guidance from the LWG, I think we need to assume the most conservative answer.

I faintly recall this subject was brought up at the Portland meeting, Fall 2012. I do not believe anything was definitively resolved at that time. I do not know if this issue was discussed this Spring in Bristol. I do not see an LWG issue on this subject.

Howard

[hash.requirements]/ Table 26 says:

The value returned shall depend only on the argument k. [ Note: Thus all evaluations of the expression h(k) with the same value for k yield the same result. — end note ]

I know. But the wording here is not clear: does that mean two
different standard library implementations have to share the
same hash function? Obviously not.

Now it isn't clear, at least to me, that if this is talking about all evaluation within the process, or across different invocations of the same process, or even across invocations of different processes. Without guidance from the LWG, I think we need to assume the most conservative answer.

Tend to agree.

I faintly recall this subject was brought up at the Portland meeting, Fall 2012. I do not believe anything was definitively resolved at that time. I do not know if this issue was discussed this Spring in Bristol. I do not see an LWG issue on this subject.

I guess it's inside

  Hashing User-Defined Types in C++1y

Thanks to the labor day, the deadline is extended. Can I submit
an issue like

"The value returned shall depend only on the argument k<ins> for
the duration of the program</ins> [ Note: Thus all evaluations of the
expression h(k) with the same value for k yield the same result
<ins> for each execution of the program(, some guideline here?)</ins>.
— end note ]"

The hashing can't be per-process BTW. First the standard knows
nothing about OS process; second, practically that makes you unable
to share an unordered container to a child process :frowning:

It is unclear to me whether the problem you wish to solve is provide:

  • resilient hash functions

  • resilient hash containers

For the latter another solution, rather than modifying the hash function, would be to modify the hash container.

For example, one could initialize std::unordered_set (and co) with a random seed and simply XOR this seed with the result of h(k). Said seed can even change each time the container is rehashed to prevent an observer from analyzing the collisions and cause undue chaining.

And this would not require a salted hash so there would be no issue with the standard.

– Matthieu

This doesn't help much as it is often trivial to construct input that
creates a hash collision on h(k) (and not just the reduced range of
h(k)%n).

Joerg