If the hash function h(p) is properly chosen, then the hash values will be approximately uniformly distributed over their range. If the hash value is n bits long, then this range is . If there are N items in the hash table, then the probability of randomly generating a hash value that corresponds to an item in the table is . In particular, if a probe p does not match any item in the table, then it will accidentally match (``collide'') with the hash value of an item in the table with probability . This can be made arbitrarily small by choosing n large enough. For example, if N is 67 million (i.e., ) and if n is 40, then the probability of a collision is .
The probability model being used here is called the finite occupancy process. The remaining results are concerned with the distribution of a set of independent hash values distributed uniformly among a set of buckets. Let P be the number of hash values (i.e., the number of probes), and let B be the number of buckets. Suppose that one can only process up to N probes per bucket. Let be the number of probes that are not processed (and hence discarded), and let be the fraction of all probes that are discarded. Both of these are random variables. We would like to compute the expectation (average) of these random variables:
The expectation is computed as follows. First approximate the finite occupancy process with the Poisson process. In Poisson process, the probability that there are k probes in a given bucket is . If , then no probes are discarded; if , then one probe is discarded; and so on. Thus the expected number of probes discarded in a single bucket is
The total number of probes discarded, , is the sum of the number of probes discarded at each bucket. There are B buckets, and they are stochastically independent (unlike the case of the finite occupancy process). Hence
Finally, since , the result follows.
The following table gives values of N such that the expectation of is less than 0.01, 0.001 and 0.0001, for various values of the ratio P/B: