In this module we start exploring how we can design programs that interact with the operating system. We introduce Java's Input and Output (I/O) standard libraries, and walk through an example Java program that consumes inputs from the command line. We use our example to introduce regular expressions, and Java's support for regular expressions.

We conclude this module with an introduction to Maven and how to use it to structure your Java project, integrate it with your IDE, introduce library dependencies, and generate code analysis reports.

  1. Given a search specification, write down it's corresponding regular expression.
  2. Write a Java program that accepts input parameters on the command line.
  3. Write a Java program that uses Java's regular expression to search for patterns in a String.
  4. Write a Java program that reads data from a file.
  5. Write a Java program that writes data to a file.
  6. Given a Java application, use Maven to build the source code.
  7. Given a Java application, use Maven to configure code coverage reports.
  8. Given a Java class with JUnit tests, generate code coverage reports with Cobertura.

  • DUE DATE: March 29th at 12:00pm (NOON)

Simple Text Document Processor

The team you are working on are trying to create a document processor for text documents. They are at the stage where they are figuring out some of the features they want and would like to create a prototype.

The prototype should deal with headers, numbering of headers as well as numbered lists. Here is an example that the design team has put together

# Header at Level 1

Headers appear on a line of their own. A header starts with the one or more occurrences of the symbol `#` followed
by one space followed by text and ends with a newline. 
Headers are section titles. The number of `#` symbols indicate the nesting of headers. 


## This is a header at level 2 

# Paragraphs

Paragraphs are free form text. Paragraphs are separated by new lines. 

This is the second paragraph in this section


# Numbered Lists 

Numbered lists appear on a separate line and start with two special characters `1.` followed by a space. 
For example here is a numbered lists 

1. This is the first item 
1. the second item 
1. the third item

We end an ordered list with an empty line.
We can nest ordered lists by adding 2 spaces followed by the same special character for numbered lists, e.g., 

1. This is the first item of the outer list 
1. This is the second item of the outer list 
  1. This is the first item of the inner list 
  1. This is the second item of the inner list 
1. This is the third item of the outer list
                    
                

Your program should allow for the users to specify the name of a file to process. The file should contain text like the example above. The output of your program will be another text file that will correctly replace headers to include the sections number, e.g., 1, 1.1, 1.2 etc. and replace numbered lists to include an item's appropriate number e.g., 1, 2, 3.

Given the example above, your program should output the following text file

1 Header at Level 1

Headers appear on a line of their own. A header starts with the one or more occurrences of the symbol `#` followed
by a space followed by text and ends with a newline. 
Headers are section titles. The number of `#` symbols indicate the nesting of headers. 

1.1 This is a header at level 2 

2 Paragraphs

Paragraphs are free form text. Paragraphs are separated by new lines. 

This is the second paragraph in this section


3 Numbered Lists 

Numbered lists appear on a separate line and start with two special characters `1.` followed by a space. 
For example here is a numbered lists 

1. This is the first item 
2. the second item 
3. the third item

We can nest ordered lists by adding 2 spaces followed by the same special character for numbered lists, e.g., 

1. This is the first item of the outer list 
2. This is the second item of the outer list 
  1. This is the first item of the inner list 
  2. This is the second item of the inner list 
3. This is the third item of the outer list
                

The team would like you to develop this program and support the following features

  1. The program should allow for arbitrary nesting of sections
  2. The program should allow for arbitrary nesting of numbered lists
  3. Paragraphs and newlines should not be altered in any way

The following features are not required but will earn you extra credit

  1. Nesting of Numbered lists should alter the numbering style. The outer most numbered list should use integers 1,2,3 etc. the next nesting level should use characters e.g., a,b,c, the third nested numbering list should go back to integers, the fourth nested list back to characters etc. Odd nesting levels use integers, even nesting levels use characters.
  2. Generate a table of contents at the start of the file that has a numbered list of all the sections. Here is the expected output for the file given here as an example with both extra credit features implemented.
    1. Header at Level 1
    1.1. This is a header at level 2
    2. Paragraphs
    3. Numbered Lists 
    
    1 Header at Level 1
    
    Headers appear on a line of their own. A header starts with the one or more occurrences of the symbol `#` followed
    by a space followed by text and ends with a newline. 
    Headers are section titles. The number of `#` symbols indicate the nesting of headers. 
    
    1.1 This is a header at level 2 
    
    2 Paragraphs
    
    Paragraphs are free form text. Paragraphs are separated by new lines. 
    
    This is the second paragraph in this section
    
    
    3 Numbered Lists 
    
    Numbered lists appear on a separate line and start with two special characters `1.` followed by a space. 
    For example here is a numbered lists 
    
    1. This is the first item 
    2. the second item 
    3. the third item
    
    We can nest ordered lists by adding 2 spaces followed by the same special character for numbered lists, e.g., 
    
    1. This is the first item of the outer list 
    2. This is the second item of the outer list 
      a. This is the first item of the inner list 
      b. This is the second item of the inner list 
    3. This is the third item of the outer list
    
                           
  • DUE DATE: April 5th at 12:00pm (NOON)

Simple Text Document Processor Part II

The word processor team is satisfied with the first prototype and would like to keep adding features.

  • The prototype is to be extended so that we can provide a text file that contains the document and generates HTML. So if we call the program with the file doc.txt the output of the program should be placed in the same folder with the name doc.html. The output must be valid HTML. The generation should adhere to the following rules
    • the name of the file should become the value of the <title> tag, e.g.,
      <title> doc.txt  </title>
      
    • each paragraph is surrounded by <p> </p> e.g.,
      <p>
      This is paragraph that can have one 
      or more 
      lines
      </p>
                                      
    • each numbered list is wrapped with <ol> </ol>, and each line of numbered list is wrapped with <li> </li>, e.g.,
      <ol>
          <li>this is line one</li>
          <li>this is line two</li>
          <ol>
             <li >Nested list line one </li> 
             <li >Nested list line two </li> 
          </ol>
          <li> this is line three</li>
      </ol>
      
    • each section is wrapped with <hN> </hN> where N is the header nesting level, e.g.,
      <h1>level 1 header</h1>
      <h2>level 2 header</h2>
      <h3>level 3 header</h3>
      
      HTML only has up to h6 so all nested headers of nesting level 6 and higher should use h6.
So the same input file as last time
# Header at Level 1

Headers appear on a line of their own. A header starts with the one or more occurrences of the symbol `#` followed
by one space followed by text and ends with a newline. 
Headers are section titles. The number of `#` symbols indicate the nesting of headers. 


## This is a header at level 2 

# Paragraphs

Paragraphs are free form text. Paragraphs are separated by new lines. 

This is the second paragraph in this section


# Numbered Lists 

Numbered lists appear on a separate line and start with two special characters `1.` followed by a space. 
For example here is a numbered lists 

1. This is the first item 
1. the second item 
1. the third item

We end an ordered list with an empty line.
We can nest ordered lists by adding 2 spaces followed by the same special character for numbered lists, e.g., 

1. This is the first item of the outer list 
1. This is the second item of the outer list 
  1. This is the first item of the inner list 
  1. This is the second item of the inner list 
1. This is the third item of the outer list
                    
                

should generate the following HTML file


<!DOCTYPE html>
<html>
<head>
  <title>doc.txt</title>
</head>
<body>
<h1>Header at Level 1</h1>

<p>Headers appear on a line of their own. A header starts with the one or more occurrences of the symbol # followed
by one space followed by text and ends with a newline. 
Headers are section titles. The number of # symbols indicate the nesting of headers. </p>

<h2> This is a header at level 2</h2>

<h1> Paragraphs</h1>

<p>Paragraphs are free form text. Paragraphs are separated by new lines. </p>

<p>This is the second paragraph in this section</p>

<h1> Numbered Lists</h1>

<p>Numbered lists appear on a separate line and start with two special characters `1.` followed by a space. 
For example here is a numbered lists </p>

<ol>
<li>This is the first item </li>
<li>the second item </li>
<li>the third item</li>
</ol>

<p>We end an ordered list with an empty line.
We can nest ordered lists by adding 2 spaces followed by the same special character for numbered lists, e.g., </p>

<ol>
<li>This is the first item of the outer list </li>
<li>This is the second item of the outer list 

<ol>
<li>This is the first item of the inner list </li>
<li>This is the second item of the inner list </li>
</ol>
</li>
<li>This is the third item of the outer list</li>
</ol>
</body>
</html>
                

Opening the file using your browser should render the document with headers and numbered lists.

The team would like to also add the following features to your prototype

  1. Allow for unordered lists. Unordered lists are defined by using two spaces and the * character. Like Numbered lists unordered lists can be nested in the same way, e.g.,
     
      * This is the first line 
      * This is the second line 
        * This is the nested list first line 
        * This is the nested list second line 
      * This is the third line 
                            
    In HTML unordered lists are wrapped using <ul> </ul> and each item of the line is wrapped with <li> </li>, e.g.,
    <ul>
      <li> This is the first line </li> 
      <li>This is the second line </li>
       <ul>
         <li>This is the nested list first line </li>
        <li>This is the nested list second line </li>
       </ul>
      <li>This is the third line </li>
    </ul>
                            
  2. Allow for emphasis in your text. Any word or series of words that are on the same line can be emphasized by wrapping them with the character *, e.g.,
    This is normal text *but this should be bolded* since I wrapped 
    it with stars and they are on the same line. 
                            
    In HTML we wrap text with <b> </b> to make it bold, e.g.,
    This is normal text <b>but this should be bolded</b> since I wrapped 
    it with stars and they are on the same line. 
                            

Your output file must be valid HTML, i.e., all open tags have a corresponding closing tag. You can also use W3C Markup Validation Service to check your output files.