CS 3500 Assignment #5
Due: Saturday, October 19, 2013
The purposes of this assignment are:
* to design code to stress test BTree implementation
* to stress test your BTree implementation
* to understand the performance of your BTree implementation
Collaboration between students is forbidden on this assignment. You are
responsible for keeping your code hidden from all other students. You
will submit this assignment within Web-CAT.
Your file of Java code should begin with a block comment that lists
1. Your name, as you want the instructor to write it.
2. Your email address.
3. Any remarks that you wish to make to the instructor.
Part of your grade will depend on the quality and correctness of your
code, part will assess the tests you design - especially the test
coverage, part will depend on the readability of your code (comments and
indentation), and part will depend on how well you follow the procedure
above for submitting your work. Assignments submitted between 12:00 am
and 11:59 pm the day after the due date will receive a 20 percentage
point penalty on the assignment.
-----------------------------------------------------
| |
| Files for this assignment can be found at: |
| |
| http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/ |
| |
-----------------------------------------------------
For this assignment, you will be stress testing your implementation of
BTree from Assignment 4. (Note: If your BTree implementation had errors,
you will need to correct them before you can complete this assignment.)
You will also need to add the following two public methods to your BTree
implementation:
/**
* Modifies:
* this binary search tree by inserting the
* first numStrings String
s from
* the given Iterable
collection
* The tree will not have any duplicates
* - if an item to be added equals an item
* that is already in the tree, it will not be added.
*
* @param iter the given Iterable
collection
* @param numStrings number of String
s
* to iterate through and add to BTree
* If numStrings is negative or larger than the number of
* String
s in iter then all String
s
* in iter should be inserted into the tree
*/
public void build(Iterable iter, int numStrings)
/**
* Effect:
* Checks to see if this BTree contains s
* @param s String
to look for in this
* @return whether this contains s
*/
public boolean contains(String s)
You will submit your timing test program, your updated BTree
implementation that you tested, and a report of your timing test output.
Format of report is given below.
--------------------------------------------------
Your timing program will contains two sets of test---testing on list of
words and testing on classical literature.
Testing on list of words:
Your timing program should test all three different operations on the four
different data sets (files) with six varying number of Strings inserted
using three different Comparators to define the ordering of the data. When
you run the program, the time that each of these operations took to
complete the task should be output.
Operations:
- bt.build(new StringIterator(filename), numStrings);
- while(it.hasNext()){ it.next(); }
With this operation, along with outputting the time, you should
also output the number of Strings that were iterated through.
- bt.contains(s) on each String s in contains.txt - 100 Strings
Files:
- Words lexicographically ordered - lexicographically_ordered.txt
- Words ordered according to StringReverseByLex - reverse_ordered.txt
- Words ordered according to StringWithOutPrefixByLex -prefix_ordered.txt
- Words in random order - random_order.txt
Number of words to iterate through:
2000 words
4000 words
8000 words
16000 words
20000 words
24000 words
Comparators:
- StringByLex
- StringReverseByLex (Given within MyComparators.java)
- StringWithOutPrefixByLex (Given within MyComparators.java)
Testing on classical literature:
Your timing program should test all three different operations on the two
complete data sets (files) using three different Comparators to define the
ordering of the data. When you run the program, the time that each of these
operations took to complete the task should be output.
Operations:
- bt.build(new StringIterator(filename));
- while(it.hasNext()){ it.next(); }
With this operation, along with outputting the time, you should
also output the number of Strings that were iterated through.
- bt.contains(s) on each String s in contains.txt - 100 Strings
Files:
- Randomized text - iliad.txt
- Text with duplicates - hippooath_DUPLICATES.txt
Comparators:
- StringByLex
- StringReverseByLex (Given within MyComparators.java)
- StringWithOutPrefixByLex (Given within MyComparators.java)
--------------------------------------------------
Your report of timing output should be submitted as an Excel file.
Your file should have two main worksheets---(1) testing on list of words
and (2) testing on classical literature. These worksheets should contain
the following headers:
Testing on list of words:
Comparator, File, Num Strings, Size (#), Build (ms), Iterate (ms), Contains (ms)
Testing on classical literature:
Comparator, File, Size (#), Build (ms), Iterate (ms), Contains (ms)
You should be able to format your output using commas or tabs so that
you can easily open/put the data in Excel.
Along with the two worksheets with the raw data, you should also create
charts for all of the data. These charts should be similar to the
example figure in the course directory
(http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/StringReverseByLex_on_reverse_ordered_txt.jpg).
--------------------------------------------------
You can download the input files, contains file, and additional
comparators from the course directory
(http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/).
--------------------------------------------------
System.currentTimeMillis() will be helpful you when completing the timings.
http://docs.oracle.com/javase/6/docs/api/java/lang/System.html