Instructions
Problem 3:   Shakespeare
Problem 4:   Priority Queue and Heap  Sort
5.3.5

Assignment 7a

This is the second part of assignment 7. You will submit it separately form the second part, as second part requires that you provide complete Javadoc documentation and WebCAT will check your compliance.

Goals:
  • Work with Java Collections Framework classes HashMap, TreeMap.

  • Learn to implement the heap-based priority queue and heapsort.

Instructions

The names of the projects and some of the project files must be exactly the same as specified in the assignment. Failure to do so makes it impossible for the graders to run your submission and results in immediate loss of at least 50% of the homework credit.

Make sure you follow the style guidelines for code indentation.

You will submit this assignment by the deadline using the Web-CAT submission system.

With each homework you will also submit your log file named pairxxx.txt where you replace xxx with your pair number.

On top of every file you submit you will have the names of both partners, and the pair number.

The .txt file will be the log of your work on this assignment. Each log entry will have data and time, who was present (one or both of the partners) and a short comment decribing what you were working on.

Submission Details:

Due Date: Saturday, November 16, 10:00 pm.

Problem 3: Shakespeare

Introduction

Have you ever wondered about the size of Shakespeare’s vocabulary? For this assignment you will write a program that reads its input from a text file and lists the words that occur most frequently, together with a count of how many different words occur in the file.

If this program were to run on a file that contains all of Shakespeare’s works, it would tell you the approximate size of his vocabulary, and how often he uses the most common words.

Hamlet, for example, contains about 4542 distinct words, and the word king occurs 202 times, while the play Macbeth contains about 3201 distinct words, and the word macbeth occurs 288 times.

Researchers use this kind of techniques to verify authenticity of some disputed texts.

The Problem

Create a project with the name Macbeth. Download the file Assignment7a.zip unzip it and add the files to the project. Add the file Macbeth.txt to the project. Run the project, to make sure you have all pieces in place. The ExamplesWords class uses the tester package as we have done before.

The text file Macbeth.txt contains the entire text of the play Macbeth and a file StringIterator.java contains code that generates istances of the class Words from a file (e.g., Macbeth.txt) one at a time.

Save the file Macbeth.txt in the Eclipse project directory (where you find the subdirectories src and bin). The Examples class includes a code that invokes the processing of the complete text of the play Macbeth.

Note: Here you will use the imperative Iterator interface that is a part of Java Standard Library. Make sure to look up the documentation for this interface and understand how it works.

The files that are provided form a skeleton of your project - you only have to add the missing parts.

Note: You may use any Java Collections Framework classes that may help you solve this problem – and we encourage you to do so.

So, here is what you need to do:

Design the class Word that represents one word of Shakespeare’s vocabulary together with its frequency counter. The constructor takes only the String (for example the word "king") and starts the counter at 1 (one).

Two Word instances are equal to each other if they represent the same String, regardless of their frequency counters. That means that you have to override the equals() and hashCode() methods.

Implement a toString method for Word that returns the word String and its frequency.

Implement the method increment() that increments the Word’s frequency.

Design a class WordsByFreq that implements the Comparator interface, so that the words can be sorted by frequencies. (Be careful!) When you are done, place this class definition as the last part of the class definition of the class Word. This is called an inner class.

Note: In this program there will be two ways of comparing the instances of the Word class - by the String that it represents and by the frequency counter for the word that this instance represents.

Design the class WordCounter that keeps track of all the words we have seen so far. It should include the following methods:

// Record the Word objects generated by the given Iterator

//   and update the number of ocurrences

void countWords(Iterator it) { ... }

 

// How many different Words has this WordCounter recorded?

int words(){ ... }

 

// Prints the n most common words and their frequencies.

void printWords(int n) { ... }

Here are additional details:

Note: The given code expects that you implement the classes as given, with the same names and methods. It will then check whether your program works correctly. That does not mean you do not need to design tests.

Testing of the Shakespeare Project

Of course, you need to test all methods as you are designing them.

Design the tests in two stages:

For the class Word and the the class WordCounter use a technique similar to what was done in the past assignments, i.e. design a class ExamplesWords with the necessary sample data and all tests. We’ve astarted you off... just keep going.

Convert all tests into JUnit tests. Hand in both versions.

Documentation

The projects should contain Javadoc documentation that should produce the documentation pages without any warnings. You do not need to submit the documentation pages with your assignment.

Problem 4: Priority Queue and HeapSort

Your job is to design a priority queue using the heap algorithm. You then use this to implement the heapsort algorithm and add it to a collection of algorithms to be evaluated for time-complexity.

The descrioption of this assignment is written in a tutorial style - so you can recall the lecture details.

If it helps you, draw the picture of the tree and manipulate stickies or pieces of paper to see how the algorithms for inserting and deleting nodes work.

Typically, we represent this heap as an ArrayList<T> with the first item (at index 0) unused (maybe null).