CS 3500 Assignment #5 Due: Saturday, October 19, 2013 The purposes of this assignment are: * to design code to stress test BTree implementation * to stress test your BTree implementation * to understand the performance of your BTree implementation Collaboration between students is forbidden on this assignment. You are responsible for keeping your code hidden from all other students. You will submit this assignment within Web-CAT. Your file of Java code should begin with a block comment that lists 1. Your name, as you want the instructor to write it. 2. Your email address. 3. Any remarks that you wish to make to the instructor. Part of your grade will depend on the quality and correctness of your code, part will assess the tests you design - especially the test coverage, part will depend on the readability of your code (comments and indentation), and part will depend on how well you follow the procedure above for submitting your work. Assignments submitted between 12:00 am and 11:59 pm the day after the due date will receive a 20 percentage point penalty on the assignment. ----------------------------------------------------- | | | Files for this assignment can be found at: | | | | http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/ | | | ----------------------------------------------------- For this assignment, you will be stress testing your implementation of BTree from Assignment 4. (Note: If your BTree implementation had errors, you will need to correct them before you can complete this assignment.) You will also need to add the following two public methods to your BTree implementation: /** * Modifies: * this binary search tree by inserting the * first numStrings Strings from * the given Iterable collection * The tree will not have any duplicates * - if an item to be added equals an item * that is already in the tree, it will not be added. * * @param iter the given Iterable collection * @param numStrings number of Strings * to iterate through and add to BTree * If numStrings is negative or larger than the number of * Strings in iter then all Strings * in iter should be inserted into the tree */ public void build(Iterable iter, int numStrings) /** * Effect: * Checks to see if this BTree contains s * @param s String to look for in this * @return whether this contains s */ public boolean contains(String s) You will submit your timing test program, your updated BTree implementation that you tested, and a report of your timing test output. Format of report is given below. -------------------------------------------------- Your timing program will contains two sets of test---testing on list of words and testing on classical literature. Testing on list of words: Your timing program should test all three different operations on the four different data sets (files) with six varying number of Strings inserted using three different Comparators to define the ordering of the data. When you run the program, the time that each of these operations took to complete the task should be output. Operations: - bt.build(new StringIterator(filename), numStrings); - while(it.hasNext()){ it.next(); } With this operation, along with outputting the time, you should also output the number of Strings that were iterated through. - bt.contains(s) on each String s in contains.txt - 100 Strings Files: - Words lexicographically ordered - lexicographically_ordered.txt - Words ordered according to StringReverseByLex - reverse_ordered.txt - Words ordered according to StringWithOutPrefixByLex -prefix_ordered.txt - Words in random order - random_order.txt Number of words to iterate through: 2000 words 4000 words 8000 words 16000 words 20000 words 24000 words Comparators: - StringByLex - StringReverseByLex (Given within MyComparators.java) - StringWithOutPrefixByLex (Given within MyComparators.java) Testing on classical literature: Your timing program should test all three different operations on the two complete data sets (files) using three different Comparators to define the ordering of the data. When you run the program, the time that each of these operations took to complete the task should be output. Operations: - bt.build(new StringIterator(filename)); - while(it.hasNext()){ it.next(); } With this operation, along with outputting the time, you should also output the number of Strings that were iterated through. - bt.contains(s) on each String s in contains.txt - 100 Strings Files: - Randomized text - iliad.txt - Text with duplicates - hippooath_DUPLICATES.txt Comparators: - StringByLex - StringReverseByLex (Given within MyComparators.java) - StringWithOutPrefixByLex (Given within MyComparators.java) -------------------------------------------------- Your report of timing output should be submitted as an Excel file. Your file should have two main worksheets---(1) testing on list of words and (2) testing on classical literature. These worksheets should contain the following headers: Testing on list of words: Comparator, File, Num Strings, Size (#), Build (ms), Iterate (ms), Contains (ms) Testing on classical literature: Comparator, File, Size (#), Build (ms), Iterate (ms), Contains (ms) You should be able to format your output using commas or tabs so that you can easily open/put the data in Excel. Along with the two worksheets with the raw data, you should also create charts for all of the data. These charts should be similar to the example figure in the course directory (http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/StringReverseByLex_on_reverse_ordered_txt.jpg). -------------------------------------------------- You can download the input files, contains file, and additional comparators from the course directory (http://www.ccs.neu.edu/home/vkp/3500-fl13/A5/). -------------------------------------------------- System.currentTimeMillis() will be helpful you when completing the timings. http://docs.oracle.com/javase/6/docs/api/java/lang/System.html