/* Csu213, Monday 4/02/07 Today we discuss hashCode() and show some examples and a simple implementation of a Hash table. Big-Oh and sorting things will be in the next set of notes * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * We will write our own simple Hashtable in a little bit, but first we'll describe some of the ideas of hashes, collision, and resolution. For starters, lets define our hash function for a simple ArrayList of size 11: h(x) = x % 11 We'll add some integers to the table: 33, 56, 47, 73 index [ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ] data [ 33 | 56 | | 47 | | | | 73 | | | ] Now say we try to add 58... 58 % 11 == 3, but slot [3] is already taken by 47!! We call this a Collision when two Objects hash to the same value/slot. What can we do about this? Well, we could just put it into the next open slot in the ArrayList... yielding: index [ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ] data [ 33 | 56 | | 47 | 58 | | | 73 | | | ] and if we try to add 80... 80 % 11 == 3 so we get another collision: index [ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ] data [ 33 | 56 | | 47 | 58 | 80 | | 73 | | | ] This (in general) is called collision resolution. This type specifically is called linear probing. Given this situation, we can generalize our hash function to a 'family' of hash functions {h0, h1, h2, h3, h4, ...}, or just add another argument, i: h(i, x) = (x + i) % 11 Now to add an element, we keep trying our hash function, starting with i = 0 until we fund an empty spot. What's the problem with this scheme? 1 - As the table fills up, elements get farther and farther away from thier original hash spot 2 - This makes searching for elements very slow since there are many collisions 3 - Checking if an element is/isn't there could take a long (not constant) time 4 - Removing an element could invalidate our *entire* table!!! Possible solutions: A) use lists for each hash value, then searches are greatly shortened B) use different (none linear) replacement hashes (for i > 0) like a second 'real' hash function or quadratic probing (see Wikipedia). Once we have found an item with a matching hash value, how can we tell if it's the right one? With equals(...) of course! Java defines two (overridable) methods for all Objects: public int hashCode(); public boolean equals(Object o); Unfortunately when we create our own class the default implementation is not usually what we want. But we have to be careful... here's the rule: If you implement hashCode(), you should also implement equals() and the other way around too. Java assumes a few things: 1) If o1.equals(o2) then it *must* be true that o1.hashCode() == o2.hashCode() 2) If o1.hashCode() == o2.hashCode() it is *not* necissarily true that o1.equals(o2) So, equals() implies hashCode(), but hashCode() doesn't imply equals(). In particular we need to be carful with mutation since it may change the hashCode() and/or equals()... we should avoid these types of errors Lastly (before a simple implementation) there are mathematicians and computer scientists who study how to design good hash functions and how these effect running times and hash performance. Anyway, after the implementation we'll review how performance of algorithms is generally measured... */ import java.util.ArrayList; // A very simple Hash Table class Hash{ ArrayList hash; int size; // Constructors, default size is 11 Hash(){ this(11); } Hash(int size){ hash = new ArrayList(size); for(int i = 0; i < size; i++)hash.add(null); this.size = size; } // Compute the Hash of a given element int hashFunction(int i, X x){ return ((x.hashCode() + i) % size); } // Add an element to this Hash table void add(X x){ // Try adding it int i; for(i = 0; i < 11; i++){ if(hash.get(hashFunction(i, x)) == null){ hash.set(hashFunction(i, x), x); return; } } // No Room Left if(i == 11) throw new RuntimeException("Hash Table Full"); // Otherwise we're good } // Get the corresponding X, if it's there, otherwise 'null' X get(X x){ int i = 0; X check; do{ // See if there's anything in the expected slot check = hash.get(hashFunction(i, x)); // Something's there... check for equality if(check != null && x.equals(check)) return check; // Next round i++; }while(i < 11 && check != null); return null; } // See if it's in this Hash table boolean contains(X x){ return get(x) != null; } public String toString(){ String acc = "[ "; for(int i = 0; i < hash.size(); i++){ X x = hash.get(i); if(i > 0) acc += " | "; if(x != null) acc += x; else acc += " "; } return acc+" ]"; } } /* To make this simpler I've only made the Hash store one generic type of element. In real life we would have different classes for the Key (the thing which gives a hashCode() to be hashed) and a Value (what we would like to store and access quickly) Hashtable and HashMap are both parameterized by two classes, so we use them like: Hashtable and HashMap and the methods used are 'put' 'containsKey' and 'get'... but to keep things simple we mimic the situation where Key=Value so ours is more like: HashMap with 'add', and 'get' just like ArrayList. Here's some Examples */ // Tests and Examples public class notes_4_02_07{ // Wrapper for System.out.println public static void print(String s){ System.out.println(s); } // Normal main(...) for running Tests public static void main(String[] args) { Hash hash = new Hash(); print(" Hash example from before..."); print(" mt: "+hash); // Repeat our example from before hash.add(33); hash.add(56); hash.add(47); hash.add(73); print(" after adding : "+hash); // These are the Collisions hash.add(58); print(" after collsn 1 : "+hash); hash.add(80); print(" after collsn 2 : "+hash); // Search for all Keys [0..100] and print out the results print("\n Looking for all the Keys..."); for(int i = 0; i < 100; i++){ if(hash.contains(i)) print(" found: "+hash.get(i)); } } } /* See the next set of Notes for more Big-Oh stuff and some sorting algorithm examples, tests, and discussion */