Assignment 7a
This is the second part of assignment 7. You will submit it separately form the second part, as second part requires that you provide complete Javadoc documentation and WebCAT will check your compliance.
Work with Java Collections Framework classes HashMap, TreeMap.
Learn to implement the heap-based priority queue and heapsort.
Instructions
The names of the projects and some of the project files must be exactly the same as specified in the assignment. Failure to do so makes it impossible for the graders to run your submission and results in immediate loss of at least 50% of the homework credit.
Make sure you follow the style guidelines for code indentation.
You will submit this assignment by the deadline using the Web-CAT submission system.
With each homework you will also submit your log file named pairxxx.txt where you replace xxx with your pair number.
On top of every file you submit you will have the names of both partners, and the pair number.
The .txt file will be the log of your work on this assignment. Each log entry will have data and time, who was present (one or both of the partners) and a short comment decribing what you were working on.
Submission Details:
Make sure the the names of all classes and interfaces in all three problems are different.
Make sure every file with name Xyyy.java contains as the first class or interface the definition of class Xyyy or interface Xyyy.
Make sure that the three classes with tests for the two problems are named Examples... as appropriate.
For this assignment you are required to provide comments in the Javadoc style. WebCAT will highlight any places where you fail to do so correctly. The sample code provided gives you an example of what is required.
Due Date: Saturday, November 16, 10:00 pm.
Problem 3: Shakespeare
Introduction
Have you ever wondered about the size of Shakespeare’s vocabulary? For this assignment you will write a program that reads its input from a text file and lists the words that occur most frequently, together with a count of how many different words occur in the file.
If this program were to run on a file that contains all of Shakespeare’s works, it would tell you the approximate size of his vocabulary, and how often he uses the most common words.
Hamlet, for example, contains about 4542 distinct words, and the word king occurs 202 times, while the play Macbeth contains about 3201 distinct words, and the word macbeth occurs 288 times.
Researchers use this kind of techniques to verify authenticity of some disputed texts.
The Problem
Create a project with the name Macbeth. Download the file Assignment7a.zip unzip it and add the files to the project. Add the file Macbeth.txt to the project. Run the project, to make sure you have all pieces in place. The ExamplesWords class uses the tester package as we have done before.
The text file Macbeth.txt contains the entire text of the play Macbeth and a file StringIterator.java contains code that generates istances of the class Words from a file (e.g., Macbeth.txt) one at a time.
Save the file Macbeth.txt in the Eclipse project directory (where you find the subdirectories src and bin). The Examples class includes a code that invokes the processing of the complete text of the play Macbeth.
Note: Here you will use the imperative Iterator interface that is a part of Java Standard Library. Make sure to look up the documentation for this interface and understand how it works.
The files that are provided form a skeleton of your project - you only have to add the missing parts.
Note: You may use any Java Collections Framework classes that may help you solve this problem – and we encourage you to do so.
So, here is what you need to do:
Design the class Word that represents one word of Shakespeare’s vocabulary together with its frequency counter. The constructor takes only the String (for example the word "king") and starts the counter at 1 (one).
Two Word instances are equal to each other if they represent the same String, regardless of their frequency counters. That means that you have to override the equals() and hashCode() methods.
Implement a toString method for Word that returns the word String and its frequency.
Implement the method increment() that increments the Word’s frequency.
Design a class WordsByFreq that implements the Comparator interface, so that the words can be sorted by frequencies. (Be careful!) When you are done, place this class definition as the last part of the class definition of the class Word. This is called an inner class.
Note: In this program there will be two ways of comparing the instances of the Word class - by the String that it represents and by the frequency counter for the word that this instance represents.
Design the class WordCounter that keeps track of all the words we have seen so far. It should include the following methods:
// Record the Word objects generated by the given Iterator |
// and update the number of ocurrences |
void countWords(Iterator it) { ... } |
|
// How many different Words has this WordCounter recorded? |
int words(){ ... } |
|
// Prints the n most common words and their frequencies. |
void printWords(int n) { ... } |
Here are additional details:
countWords consumes a Word iterator that generates the words and builds the collection of the appropriate Word instances, with the correct frequencies. This collection is then used by the next two method to show the results of our text analysis.
words produces the number of different words that have been counted.
printWords consumes an integer n and prints the top n words with the highest frequencies (using the toString method defined in the class Word).
Note: The given code expects that you implement the classes as given, with the same names and methods. It will then check whether your program works correctly. That does not mean you do not need to design tests.
Testing of the Shakespeare Project
Of course, you need to test all methods as you are designing them.
Design the tests in two stages:
For the class Word and the the class WordCounter use a technique similar to what was done in the past assignments, i.e. design a class ExamplesWords with the necessary sample data and all tests. We’ve astarted you off... just keep going.
Convert all tests into JUnit tests. Hand in both versions.
Documentation
The projects should contain Javadoc documentation that should produce the documentation pages without any warnings. You do not need to submit the documentation pages with your assignment.
Problem 4: Priority Queue and HeapSort
Your job is to design a priority queue using the heap algorithm. You then use this to implement the heapsort algorithm and add it to a collection of algorithms to be evaluated for time-complexity.
The descrioption of this assignment is written in a tutorial style - so you can recall the lecture details.
A heap is a complete binary tree. It means that every level is filled, except for the last level. The last level is filled from the left.
So a heap with five nodes will have the following shape:
node-1
/ \
node-2 node-3
/ \
node-4 node-5
The nodes are labeled by levels from left to right, starting at 1.
The value in each node is greater-than or equal-to (for some comparison definition) the values in both of its children. So the item with the highest value (highest priority) is at the root of the tree.
The label of the parent of node i is i/2
The label of the left-child of node i is 2*i
The label of the right-child of node i is 2*i+1
If it helps you, draw the picture of the tree and manipulate stickies or pieces of paper to see how the algorithms for inserting and deleting nodes work.
Typically, we represent this heap as an ArrayList<T> with the first item (at index 0) unused (maybe null).
Add a HeapExamples class to your project.
Make three examples of the following heaps by defining an ArrayList<Integer> for each case in your examples class and an initHeaps method to add the values in the appropriate order.
To build the following heap
node-1
/ 70 \
/ \
node-2 node-3
/ 60 \ 40
/ \
node 4 node 5
35 50
we would proceed as follows:
ArrayList<T> myheap = new ArrayList<T>();
void initHeap(){
this.myheap.add(null); // the unused first item
this.myheap.add(70);
this.myheap.add(60);
this.myheap.add(40);
this.myheap.add(35);
this.myheap.add(50);
}
Here are the three heaps to define:
node-1
/ 80 \
/ \
node-2 node-3
/ 50 \ / 40
/ \ /
node-4 node-5 node-6
45 20 30
node-1
/ 50 \
/ \
node-2 node-3
/ 45 \ 40
/ \
node-4 node-5
30 20
node-1
/ 70 \
/ \
node-2 node-3
/ 45 \ / 50
/ \ /
node-4 node-5 node-6
30 20 40
Define the class ExamplesHeaps that will contain the tests for the methods you now design.
You can use the data defined in HeapExamples class for the tests of your methods.
Define the class PriorityQueue<T> that will represent a heap-based priority queue. It has two fields: an ArrayList<T> and a Comparator<T> that determines the ordering of priorities.
Design the method isLeaf that consumes a node label (an int) and returns true if the node has no children (remember the numberings...).
Design the method higherPriorityChild that consumes the index of a node that is not a leaf and produces the index of its child with the highest priority.
Note: If the node has only one child, then this will be the one with the higher priority, of course.
Design the method insert that inserts a new node into the heap.
Here is what we covered in the lectures:
Insert the new item at the next position in the bottom level (the end of the list representation). Say, this is position k.
Upheap from k:
While (k > 1 and heap(k) > heap(k/2)) {
~~~~~~swap heap(k) and heap(k/2)
~~~~~~set k to k/2
}
Design the method remove that removes the node with the highest priority from the heap.
Here is what we covered in the lectures:
save the root item as a temporary
move the last item into the root position
Downheap from k = 1:
While (k is not a leaf) {
~~~~~~find ck the node with the larger child of the node k
~~~~~~if heap(k) < heap(ck) {
~~~~~~~~~~~~swap heap(k) and heap(ck)
~~~~~~~~~~~~set k to ck }
~~~~~~else stop
}