copyright 2012 Felleisen, Proulx, et. al.

CS 2510 Spring 2012:Assignment 11 - HashMap; Equals; Images; Text Compression; JUnit

Practice Problems

Work out as the following exercises from the textbook. Do not add them to your assignment submission, though you should keep them in your electronic portfolio in your own svn repository.

Problem: Color Processing

Start a new project for this portfolio, name it Images.

Download the file ImageProcessor.zip and import all .java files into it. Save all image files (.png files) in the same directory in your EclipseWorkspace where your src and bin files for this project are located.

Examine the files that are given to understand what they do. Here is a brief overview of their design.

The class ImageReader allows you to read any .bmp or .png file and analyze the individual pixels. The constructor expects the name of the file to be read. It reads the file and initializes the value of the width and height fields for the given image.

The method Color getColorPixel(int x, int y) returns the color value of the pixel at the given location. You can extract the red, blue, and green component of the color as integers using the methods

   c.getRed()
   c.getGreen()
   c.getBlue()

Try the following tasks that will allow you to manipulate color images:

Create a negative of the given Flowers.png image. Explore other ways of manipulating the images and document your exploration.

To create new images and save then as .png files, use the given class ImageBuilder. It works as follows:

The constructor also initializes the field BufferedImage image that is ready to receive the data needed to represent the resulting image. You need to supply the height and the width of the image.
The method setColorPixel (x, y, r, g, b) sets the color of the given pixel in the image to the specified RGB shade.
The method public void saveImage(String filename) saves the image you have created in the .png format --- it adds the .png to the filename you specify.

Note: The class ImageBuilder2 works the same way as ImageBuilder, but in addition it displays the image as it is created in a Canvas. To use this class you need to import the world libraries: impworld.jar, geometry.jar, and colors.jar.

Pair Programming Assignment

These problems should be checked into your pair's SVN repository by the due date.

Project naming requirements

The names of the projects and some of the project files must be exactly the same as specified in the assignment. Failure to do so makes it impossible for the graders to run your submission and results in immediate loss of at least 50% of the homework credit.

Every file you submit must have a header that identifies you and the problem it solves. So, for the first problem the header should look like this:

// Assignment 1 Problem 1
// Partner Name1
// partner1username
// Partner Name2
// partner2username
// 2 April 2012

Problem 1 Shakespeare

Introduction

Have you ever wondered about the size of Shakespeare's vocabulary? For this assignment you will write a program that reads its input from a text file and lists the words that occur most frequently, together with a count of how many different words occur in the file. If this program were to run on a file that contains all of Shakespeare's works, it would tell you the approximate size of his vocabulary, and how often he uses the most common words.

Hamlet, for example, contains about 4542 distinct words, and the word king occurs 202 times, while the play Macbeth contains about 3201 distinct words, and the word macbeth occurs 288 times.

Researchers use this kind of techniques to verify authenticity of some disputed texts.

The Problem

Create a project with the name HW11Problem1. Download the file Assignment11.zip unzip it and add the files to the project. Run the project, to make sure you have all pieces in place. The ExamplesWords class uses the tester package as we have done before.

The text file Macbeth.txt contains the entire text of the play Macbeth and a file StringIterator.java contains code that generates Words from a file (e.g., Macbeth.txt) one at a time. Save the file Macbeth.txt in the Eclipse project directory (where you find the subdirectories src and bin). The Examples class includes a code that invokes the processing of the complete text of the play Macbeth.

Note: Here you will use the imperative Iterator interface that is a part of Java Standard Library. Make sure to look up the documentation for this interface and understand how it works.

The files that are provided form a skeleton of your project - you only have to add the missing parts.

Note: You may use any Java Collections Framework classes that may help you solve this problem -- and we encourage you to do so.

So, here is what you need to do:

Design the class Word that represents one word of Shakespeare's vocabulary together with its frequency counter. The constructor takes only the String (for example the word "king") and starts the counter at 1 (one).

Two Word instances are equal to each other if they represent the same String, regardless of their frequency counters. That means that you have to override the equals() and hashCode() methods.
Implement a toString method for Word that returns the word String and its frequency.
Implement the method increment() that increments the Words frequency.
Design a class WordsByFreq that implements the Comparator interface, so that the words can be sorted by frequencies. (Be careful!) When you are done, place this class definition as the last part of the class definition of the class Word. This is called an inner class.

Note: In this program there will be two ways of comparing the instances of the Word class - by the String that it represents and by the counter for the word that this instance represents.

Design the class WordCounter that keeps track of all the words we have seen so far. It should include the following methods:

   // Record the Word objects generated by the given Iterator
   //   and update the number of ocurrences
   void countWords(Iterator it) { ... }

   // How many different Words has this WordCounter recorded?
   int words(){ ... }

   // Prints the n most common words and their frequencies.
   void printWords(int n) { ... }

Here are additional details:

countWords consumes a Word iterator that generates the words and builds the collection of the appropriate Word instances, with the correct frequencies. This collection is then used by the next two method to show the results of our text analysis.
words produces the number of different words that have been counted.
printWords consumes an integer n and prints the top n words with the highest frequencies (using the toString method defined in the class Word).

Note: The given code expects that you implement the classes as given, with the same names and methods. It will then check whether your program works correctly. That does not mean you do not need to design tests.

Testing of the Shakespeare Project

Of course, you need to test all methods as you are designing them. Design the tests in two stages:

For the class Word and the the class WordCounter use a technique similar to what was done in the past assignments, i.e. design a class ExamplesWords with the necessary sample data and all tests. \textit{We've astarted you off... just keep going.
Convert all tests into JUnit tests. Hand in both versions.

Documentation

The projects should contain Javadoc documentation that should produce the documentation pages \textbf{without any warnings}. You do not need to submit the documentation pages to the repository.

Problem 2

Create a project with the name HW011Problem2. Define the class Algorithms and the class ExamplesHuffman.

Imagine that we want to define the most efficient way for encoding the letters of alphabet using only sequences of bits (values 0 and 1). David Huffman designed an algorithm that maake this possible.

We start by looking at the text we want to encode. Assume it is represented as a single String. For example, the text may be the following 45 characters:

In the midst of the word he was trying to say
....x....x....x....x....x....x....x....x....x

Or, you can use a smaller (20 character) String from a well known song:

oh, say, can you see
....x....x....x....x

Your first task is to produce a histogram that records the frequencies of each letter (including space) that occurs in the given String.

For each letter the histogram records how many times the letter occurred in the typical text we plan to encode and decode. We use instances of the class LF to encode the frequency of each letter. Our histogram is then an ArrayList< LF>.

This is very similar the the Shakespeare problem in your current homework assignment.
```
       +---------------+
       | LF            |
       +---------------+
       | String letter |
       | int freq      |
       +---------------+
```
In the class ExamplesHuffman make an example of the histogram you would get from the shorter text given previously.
IN the class Algorithms design the method computeHisto that computes the histogram for a given String of text.

You can use any type of Java loop to implement this method. See if for-each loop woud work for you.

Given the letter frequencies, we can build a binary-tree that represents the optimal encoding for each letter in our String. We first show examples of the trees that represent such encodings. The KeyTree class hierarchy is defined as:

              +-------------------------+
              |  +--------------------+ |
              v  v                    | |
          +----------+                | |
          | KeyTree  |                | |
          +----------+                | |
          | int freq |                | |
          +----+-----+                | |
              / \                     | |
             +---+                    | |
               |                      | |
           +---+-----------+          | |
           |               |          | |
 +---------------+  +---------------+ | |
 |  Leaf         |  |     Node      | | |
 +---------------+  +---------------+ | |
 | String letter |  | KeyTree left  |-+ |
 +---------------+  | KeyTree right |---+
                    +---------------+

The code that defines these classes is given below:

  // to represent a Huffman Tree
  abstract class KeyTree{
     // the frequency of this character or collection
     int freq;

     KeyTree(int freq){
        this.freq = freq;
     }
  }

  // to represent a single character
  class Leaf extends KeyTree{
     String letter;
    
     Leaf(String letter, int freq){
        super(freq);
        this.letter = letter;
     }
  }
  // to represent a splitting node in the KeyTree
  class Node extends KeyTree{
     KeyTree left;
     KeyTree right;
  
     Node(KeyTree left, KeyTree right){
        super(left.freq + right.freq);
        this.left = left;
        this.right = right;
     }
  }

In the class ExamplesHuffmna make examples of the following two trees. The first one represents the optimal encoding for the String cat at bat, the second one represents the optimal encoding for the String here there (the number next to the each Leaf is it's frequency):

                10                   10
              /    \               /    \
             6      4             4      6
            / \    / \           / \    / \
           a3 t3  2  sp2        h2 r2  e4  2
                 / \                      / \
                c1 b1                    t1 sp1

To construct our encoding we can save the histogram data in a priority queue of KeyTrees, where the highest priority item is the one with the lowest frequency. This means we first need a Comparator that compares using freq.

Design the comparator ByFreq.
The encoding for the character "t" in our first example is the String "01", but in the second example it is "110". Notice that the numbers 0 and 1 tell us whether we should choose the left or right tree.

Design the method findPath for the KeyTree classes that consumes a character and produces a String representing this encoding, ending with the given character.

So, for the input "t" the first tree would produce "01t", and the second tree would produce "110t".
In the class Algorithms design the method encodeString that consumes a KeyTree and a String that consists only of the characters encoded in the given KeyTree and produces the String of 0s and 1s that represents the encoding of the given String.

Note: In a real application each character 0 or 1 in the encoding would be represented as a single bit. Considering that the encoding of every character in a String usually requires 8 or 16 bits, this is typically a serious improvement.
Design the method nextLetter for the KeyTree classes that consumes a String of 0s and 1s and produces the letter that the encoding represents (in the form of a String of length 1). The method should throw an exception if the given String is not a valid encoding.

So, in the first tree the input "01", would produce "t", in the second tree the input "110" would produce "t".
Design the method decode for the KeyTree classes that consumes a String that represents an encoding of a sequence of characters and produces a String that the encoding represents. The method should throw an exception if the given String is not a valid encoding.

So, in the first tree the input "1000001", would produce the "cat", and in the second tree "1100010" would produce "the". The second tree would fail for the "001".
Design the method buildTree as follows:

Use the method buildPQ shown below to build the initial priority queue (of Leafs) from the LF data.
```
   // insert the frequency data into a priority queue
   // produce an priority queue of KeyTree data
   public PriorityQueue< KeyTree> buildPQ(ArrayList< LF> lfList){
      PriorityQueue< KeyTree> pq =
           new PriorityQueue< KeyTree>(lfList.size(), new ByFreq());    
      for(LF lf : lfList){
          pq.offer(new Leaf(lf.c, ch.occurs));
      }   
      return pq;
   }
```
Now design the method buildTree that consumes a priority queue you built and produces the KeyTree that represents the encoding. It works as follows:
- if the priority queue is empty, no tree can be built and we should throw an exception.
- if the size of the priority queue is 1, it contains a single KeyTree, so the tree is removed and returned.
- otherwise, the method removes the two KeyTrees with the highest priority (lowest frequencies) and adds a new KeyTree to the queue that has the removed items as its left and right subtrees. It then recurs, consuming the updated priority queue.
Look up the documentation for the methods for the Java PriorityQueue in the Java Collections Library.

Last modified: Thu Mar 29 17:03:46 EDT 2012