If the hash function h(p) is properly chosen, then the hash values
will be approximately uniformly distributed over their range.
If the hash value is n bits long,
then this range is .
If there are N items in the hash table, then the probability of
randomly generating a hash value that corresponds to an item in the
table is
.
In particular, if a probe p does not match any
item in the table, then it will accidentally match (``collide'') with
the hash value of an item in the table with
probability
.
This can be made arbitrarily small by choosing n large enough.
For example, if N is 67 million
(i.e.,
)
and if n is 40,
then the probability of a collision
is
.
The probability model being used here is called the finite
occupancy process. The remaining results are concerned with the
distribution of a set of independent hash values distributed uniformly
among a set of buckets. Let P be the number of hash values (i.e.,
the number of probes), and let B be the number of buckets.
Suppose that one can only process up to N probes per bucket.
Let be
the number of probes that are not processed (and hence
discarded), and
let
be
the fraction of all probes that are discarded.
Both of these are random variables.
We would like to compute the expectation (average) of these random variables:
The expectation is computed as follows.
First approximate the finite occupancy process with the Poisson process.
In Poisson process, the probability that there are k probes in a
given bucket is .
If
,
then no probes are discarded;
if
, then one probe
is discarded; and so on.
Thus the expected number of probes discarded in a single bucket is
The total number of probes
discarded, ,
is the sum of the
number of probes discarded at each bucket. There are B buckets, and
they are stochastically independent (unlike the case of the finite
occupancy process). Hence
Finally, since ,
the result follows.
The following table gives values of N such that
the expectation of
is less than 0.01, 0.001 and 0.0001, for various values of the
ratio P/B:
