2  Input and Output: Plain Text

Help Desk: eof-object?, read-line, read-char, printf, write-char, reading, writing



Unfortunately it isn't always possible to use S-expressions for storing data in a file. Some people just haven't heard that S-expressions are superior to plain, unstructued text. Some people are paren-phobic. Some programs -- well, quite a few -- evolved in pure ignorance of Lisp's S-expression format and their storage format is now a part of the standard repertoire. Our programs must interact with the data files of these standard programs and, hence, we must learn to deal with these formats.

2.1  Reading Lines

As mentioned, you will -- in general -- need a parser (whatever that is) to read information from a file or the keyboard and turn it into an internal form of data. Fortunately though, there are many common cases for which it is easy to use or write functions like read. This section presents an introduction to this topic.

;;  -->  (union String eof)
;; to read a line from the default input device 
(define (read-line) ...)

;;  -->  (union Char eof)
;; to read a character from the default input device 
(define (read-char) ...)

;;  -->  (union Char eof)
;; to peek at the next character from the default input device 
(define (peek-char) ...)

;; AnyValue  -->  Boolean
;; to test whether v is the special end-of-file object
;; (eof-object? v) is true if and only if v is eof
(define (eof-object? v) ...)

;; printf : String S-expression ...  -->  Void 
;; to print a series of Scheme values according to the format string f
(define (printf f . args) ...)

;; format : String S-expression ...  -->  Void 
;; to create a string from args according to the format string f
(define (format f . args) ...)

Format Strings
~n ~% represents a newline

~a

~A use display style printing

~s

~S use write style printing

~v

~V use print style

~e

~E use the current error value conventions

~c

~C use write-char's convention for the value; if the value is not a character, the exn:application:mismatch exception is raised

~b

~B present an exact number in its binary representation; if the value is not an exact number, the exn:application:mismatch exception is raised

~o

~O present an exact number in its octal representation; if the value is not an exact number, the exn:application:mismatch exception is raised

~x

~X present an exact number in its hexadecimal representation; if the value is not an exact number, the exn:application:mismatch exception is raised

~~

represents a tilde (~)

Figure 30:  Scheme's text and character oriented I/O primitives

Let us return to our running example, managing student grades. People may keep grade records in files like that:

  Adam 78 88 69 66
  Brad 88 87 86 22
  Cath 99 88 88 90
  Dave 77 78 77 78
  Fawn 90 89 81 60
  Gege 67 78 81 85
  ...
  Zoro 33 44 55 66

They may even have a README file around that specifies the exact format:

  The program grade-txt.ss expects grades-txt.dat to 
  contain a TxtDB:

  A TxtDB file is 
   a sequence of Lines.
  A Line is 
   Name Grade ...   ;; exactly one space between tokens 
  A Grade is a 
   number between 0 and 100 (inclusive)

And then may even want you to write a program that computes the gpas for this class.

Ideally a magician could just wave his magic wand and turn this form of data into something we can just read. Or, you could write a Scheme program that reads this text format and turns it into the kind of format that gpas in figure 27 expects. To do that, your program needs to read all the lines, turn each one into a GradeRecord, and collect them in a list.

Let's start with reading the lines. Scheme provides a number of primitives for reading and writing lines and characters from a file. Figure 30 provides an overview, but to understand it properly, we need a mental model of what files really are when they are not just containers for S-expressions.

Roughly speaking, a file is a sequence of lines; each line is a sequence of characters with a termination character. That is, it is a line followed by a file. After we have stepped through all the lines, we reach the end of the file, also known as eof. Read primitives get one item at a time. If the primitive can't find any characters before the end of the file, its result is eof, the special end-of-file object. Thus, the expression

(with-input-from-file "grades-txt.dat"
  (lambda ()
    (list
      (read-line)
      (read-line)
      (read-line))))      

reads three lines from "grades-txt.dat" and combines them in a list. Assuming this file refers to the line-oriented version of our grade file, we get this list:

(list "Adam 78 88 69 66"
      "Brad 88 87 86 22"
      "Cath 99 88 88 90")      

That is, we get a list of three strings.

;;  -->  (Listof (Listof String))
;; create a list of lines, where a line is a list of strings 
(define (read-all-lines)
  (let loop ()
    (let ([next (read-line)])
      (cond
        [(eof-object? next) '()] 
        [else (cons (regexp-split " " next) (loop))]))))

;; (Listof (Listof String))  -->  (Listof GradeRecord)
;; turn the lines into list-based grade records 
(define (lines->records file)
  (map (lambda (ln) (cons (car ln) (map  string--> number (cdr ln)))) file))

;; run, program, run:
(lines->records (with-input-from-file "grades-txt.dat" read-all-lines))

Figure 31:  Reading a plain-text grade file

Let's see how the style of ``How to Design Programs'' or part I work here using read-line. Here is a template for functions that process files:

;; fun-for-file :  -->  ???
;; to read and process each line in a file 
(define (fun-for-file)
  (let ([a-line (read-line)])
    (cond
      [(eof-object? a-line) ...]
      [else ... a-line ... (fun-for-file) ...])))

The template reads one line and names the resulting value a-line. The let-expression is necessary because read-line reads a different part of the file every time it is used. Hence, we need to name its result.

Since read-line produces either a string or eof, we need to distinguish the two cases with two clauses. If a-line is eof, we have no additional information. If it is a string, we have the string and we know that it makes sense to recur because we haven't found the end of the file yet.

Working with this template as a starting point, we can easily write a function that reads all lines of a file into a list:

;;  -->  (Listof String)
;; create a list of lines
(define (read-all-lines)
  (let loop ()
    (let ([next (read-line)])
      (cond
        [(eof-object? next) '()] 
        [else (cons next (loop))]))))

While this doesn't give us exactly what we want, we now have the entire file as a list, and we do know how to process lists.

Our biggest problem is that a record in text format is exactly one line and that these lines are read as strings. The gpa program consumes lists of lists instead, where each of the latter lists consists of a name and a bunch of grades. To create this list from a string, we must first TOKENIZE the string, that is, extracts the pieces, and then convert a bunch of these pieces into numbers.

In general, dissecting a string and extracting its pieces is a complex task.12 For many cases, however, it just means separating the string at some points, usually but not always, a single character. Because these situations occur frequently, Scheme's library for strings provides the function

;; String String  -->  (Listof String)
(define (regexp-split a-separator a-string) ...)

The function consumes two strings. It produces a list of substrings of the second argument. More precisely, it removes all occurrences of the first argument from the second argument and splits the second string at those places:

(regexp-split "a" "bcd") ;; produces 
(list "bcd")
and
(regexp-split "-" "-bcd-efg") ;; produces
(list "" "bcd" "efg")

The empty string in the list is the substring to the left of the first occurrence of "-".

Now we can write the part of the text-based gpa program that is missing. We can add the functional call for splitting the line at two places: in read-all-lines and in line->record. The code in figure 31 shows the first solution. As read-all-lines reads the lines in the given file, it also splits each of them at blank spaces. Hence, reading the grades-txt.dat file produces this list:

(list (list "Adam" "78" "88" "69" "66")
      (list "Brad" "88" "87" "86" "22")
      (list "Cath" "99" "88" "88" "90")
      (list "Dave" "77" "78" "77" "78")
      (list "Fawn" "90" "89" "81" "60")
      (list "Gege" "67" "78" "81" "85")
      ...
      (list "Zoro" "33" "44" "55" "66"))      

The translation of this list of list of strings into a list of grade records is straightforward; see figure 31 for the details.

2.2  Reading Comma-Separated Values

Some programs allow users to save tabular data as comma-separated text. So one of your friends and colleagues without programming experience may have used one of such programs to create the grades file and now it looks like this:

  Adam,78,88,69,66
  Brad,88,87,86,22
  Cath,99,88,88,90
  Dave,77,78,77,78
  Fawn,90,89,81,60
  Gege,67,78,81,85
  ...
  Zoro,33,44,55,66

That is, each line still represents a grade record, but in each line, the name is separated from the grades with exactly one comma and each grade is separated from the next one with another comma.

To accommodate this new file format, we just modify read-all-lines:

;;  -->  (Listof (Listof String))
;; create a list of lines, where a line is a list of strings 
(define (read-all-csv)
  (let loop ()
    (let ([next (read-line)])
      (cond
        [(eof-object? next) '()] 
        [else (cons (regexp-split "," next) (loop))]))))

Specifically, we split the strings where they contain "," instead of single blank-spaces.

Comparing the two definitions naturally calls for an abstraction that reads lines and separates them where needed. Figure 32 displays a function definition of such a function. In addition, the figure shows how to re-define read-all-lines and read-all-csv in terms of this abstraction and how to define read-all-tab, a function that reads tab-separated values. The latter is as important as the comma-separated value version; many existing programs from popular software producers use it to export data.

;; String  -->  ( -->  (Listof (Listof String)))
;; create a list of lines, where a line is a list of strings separated via s
(define (read-separated-lines s)
  (letrec ([loop (lambda ()
                   (let ([next (read-line)])
                     (cond
                       [(eof-object? next) '()] 
                       [else (cons (regexp-split s next) (loop))])))])
    loop))

;;  -->  (Listof (Listof String))
;; create a list from lines with space-separated tokens 
(define read-all-lines (read-separated-lines " "))

;;  -->  (Listof (Listof String))
;; create a list from lines with comma-separated tokens 
;; create a list of lines, where a line is a list of strings 
(define read-all-csv (read-separated-lines ","))

;;  -->  (Listof (Listof String))
;; create a list from lines with TAB-separated tokens 
(define read-all-tab (read-separated-lines "\t"))

Figure 32:  Reading comma-separated values

2.3  Writing Plain Text

People who like to keep grades in plain-text files also like to see the final grade-point averages in fancy formats. Say your professor wants to print the grades from our standard example in a tabular form like this:

  Name |  GPA
  ------------
  Adam | 75.25
  Brad | 70.75
  Cath | 91.25
  Dave | 77.50
  Fawn | 80.0
  Gege | 77.75
  Zoro | 49.50

Let's assume for now that all names have four letters; we can always refine the program later.

;; (Listof (list String Number))  -->  Void
(define (print-gpas gpas)
  (printf "Name |  GPA~n")
  (printf "------------~n")
  (for-each (lambda (x) (printf "~a | ~a~n" (car x) (number->fstring (cadr x))))
            gpas))

;; Number  -->  String
(define (number->fstring x)
  (let* ([x ( inexact--> exact (round (* x 100)))]
         [i (quotient x 100)]
         [d (remainder x 100)])
    (if (<= 0 d 9) (format "~a.0~a" i d) (format "~a.~a" i d))))

Figure 33:  Printing gpa tables

Even a cursory look at the problem statement and its one example tells you that the first two lines are special and that the other lines are uniformly generated from our database. So, your main output procedure -- print-gpas in figure 33 -- prints the first two lines and then iterates over the gpas. The first two lines are just plain strings so that an ordinary display and newline would work fine. The iteration, however, needs to format a name, a vertical line with some space, and a number in a special shape as a single, printed line.

Even though formatting several values into a single line is feasible with display, Scheme provides format strings and functions on format strings for just such situations. Figure 30 contains the specification of format strings and two functions on format strings: printf and format. Both consume a format string and a series of arbitrary values. The format string contains special sequences, which are interpreted according to the table at the bottom of figure 30. Thus, the format string "~n" turns into a platform-appropriated new line. If the format string contains "~a", it turns the corresponding value in the argument sequence into a string according to the conventions that govern display. In particular,

(format "~a :: ~a~n" "hello" "world")

turns into a single line like this:

"hello :: world\n"

where \n represents a new line.

Figure 33 uses both functions to print a list of gpas as a table. Specifically the auxiliary function in print-gpas that is iterated over the list prints the name of the student and the corresponding gpa separated with

" | " 

It also turns the gpa into a decimal number with exactly two digits to the right of the decimal point. This second task is the responsibility of number->fstring to implement this second task, and it accomplishes this task with format. The function multiplies the given number with 100, uses quotient and remainder to split the last two digits from the rest of the number, and then creates the string from these digits with format. If the remainder function produces a single digit, the format string contains the required 0.

;;  -->  Void
;; read grade records, compute gpas, print averages 
(define (gpas) 
  (print-gpas 
   (compute-gpas
    (lines->records (with-input-from-file DB read-all-lines)))))

;; (Listof (list String Number))  -->  Void
(define (print-gpas gpas)
  (printf "Name |  GPA~n")
  (printf "------------~n")
  (for-each (lambda (x) (printf "~a | ~a~n" (car x) (number->fstring (cadr x))))
            gpas))

;; Number  -->  String
(define (number->fstring x)
  (let* ([x ( inexact--> exact (round (* x 100)))]
         [i (quotient x 100)]
         [d (remainder x 100)])
    (if (<= 0 d 9) (format "~a.0~a" i d) (format "~a.~a" i d))))

;; (Listof (cons String (Listof Number)))  -->  (Listof (list String Number))
(define (compute-gpas g)
  (map (lambda (a-record) (list (car a-record) (average (cdr a-record)))) g))

;; (Listof Number)  -->  Number
(define (average alon) ( exact--> inexact (/ (apply + alon) (length alon))))

;;  -->  (Listof (Listof String))
(define (read-all-lines)
  (let loop ()
    (let ([next (read-line)])
      (if (eof-object? next) '() (cons (regexp-split " " next) (loop))))))

;; (Listof (Listof String))  -->  (Listof GradeRecord)
(define (lines->records file)
  (map (lambda (ln) (cons (car ln) (map  string--> number (cdr ln)))) file))

;; Constants: 
(define DB "grades-txt.dat")

;; run program run:
(gpas)

Figure 34:  A full gpa calculator

;; Number  -->  String
(define (number->fstring x)
  (let* ([x ( inexact--> exact (round (* x 100)))]
         [i (quotient x 100)]
         [d (remainder x 100)])
    (if (<= 0 d 9) (format "~a.0~a" i d) (format "~a.~a" i d))))

;; (Listof Number)  -->  Number
(define (average alon)
  ( exact--> inexact (/ (apply + alon) (length alon))))

;; OUTPUT:
(printf "Name |  GPA~n")
(printf "------------~n")
(for-each
  (lambda (x) (printf "~a | ~a~n" (car x) (number->fstring (average (cdr x)))))
  ;; COMPUTE: 
  (map (lambda (a-record) (list (car a-record) (average (cdr a-record))))
    (map (lambda (ln) (cons (car ln) (map  string--> number (cdr ln))))
      (with-input-from-file "grades-txt.dat" ;; INPUT:
	(rec loop
	  (lambda ()
	    (let ([next (read-line)])
	      (if (eof-object? next) '() (cons (regexp-split " " next) (loop))))))))))

Figure 35:  A short yet unmanageable gpa calculator



Programming Style:  Figure 34 collects all the function definitions for a program that reads a space-separated file of grades and prints a table. The entire program consists of seven function definitions and one constant definition plus one expression, which calls gpas, the main function. The function definitions break the program into comprehensible chunks, and the names of the functions are suggestive. It is therefore easy to read and understand the program, even without purpose statements for the individual functions. Furthermore, it would be easy to modify this program in many ways, e.g.,

  1. accommodate names in print-gpas that are longer than four letters;

  2. accommodate two names (first and last) in compute-gpas and lines->recods; or

  3. accommodate a file format with two-lines per record in read-all-lines.

If you wish to practice programming skills, this list contains excellent examples of tasks on program maintenance.

Now compare the code in figure 34 with that in figure 35. This second figure presents the same program from a ``scripting perspective,'' that is, as a program that is probably used once and thrown away afterwards. Instead of separating the programs into tasks for reading, computing, and printing, the programmer instead uses loops in the main expression to compute almost everything. Only average and number->fstring are factored out because they clearly compute distinct tasks. It is obviously extremely difficult to read, comprehend, and modify such code. Hence, if there is even the remotest chance that code survives a single computation, i.e., when it is saved in a file, we recommend using functions and organizing it at least to some extent. 

2.4  Character for Character

Here is the file format for our grade database (DB) again:

  The program grade-txt.ss expects grades-txt.dat to 
  contain a TxtDB:

  A TxtDB file is 
   a sequence of Lines.
  A Line is 
   Name Grade ...   ;; exactly one space between tokens 
  A Grade is a 
   number between 0 and 100 (inclusive)

The description contains two tricky aspects. First, it assumes that every grade record is on a single file. Second, it specifies that there is exactly one space between the tokens (name, individual grade) in a record.

A quick test with any of the appropriate grading programs (say the one in figure 34) shows that these assumptions are critical for the programs to function properly. If a human editing action creates a file like this:

  Adam 78 88 69    66
  Brad 88 87 86 22
  Cath 99 88 88  90
  Dave 77 78 
   77 78
  Fawn 90 89 81 60
  Gege 67 78 81 85
  ...
  Zoro 33 44 55 66

in which either one or even both of the constraints are invalid, then the program signals an error and fails to produce any useful output.

The two kinds of problems require two radically different kinds of solutions. For the elimination of superfluous spaces around a number, we just adjust the pattern input for regexp-split so that any number of spaces (but more than one) divides two tokens:

(define (read-all-lines)
  (let loop ()
    (let ([next (read-line)])
      (if (eof-object? next) '() (cons (regexp-split "[\\ ]+" next) (loop))))))

Since we cover regular expressions in part part:string, we omit a detailed explanation here. The second change is, however, a problem of writing a special function for reading text-based records from a file.

Naturally you can solve a problem only if you understand it. Hence the first task is to write down what the file may look like without the constraints:

  The program grade-txt.ss expects grades-txt.dat to 
  contain a TxtDB:

  A TxtDB file is 
   a sequence of Lines followed by Continuation Lines: 
  A Line is 
   Name Grade ...   
   ;; with spaces (at least one) between tokens 
  A Continuation Line is 
   at least one space, followed by Grade ...   
   ;; with spaces (at least one) between tokens 
  A Grade is a 
   number between 0 and 100 (inclusive)

The intention is that a grade record corresponds to an arbitrary number of lines. It begins with a line whose first token is a name, followed by grades. All subsequent lines that belong to the same record start with at least one space and contain only grades, again separated by spaces.

Take a look at this sample file:

Adam 
 78 88 100  66
Brad 
 88 87  86  22
Cath 
 99 88  88  90
Dave 
 77 78  77 105
Fawn 
 90 89  81  60
Gege 
 67 78  81  85
Zoro 
 33 44  55  66

Here your professor placed all grades on a separated line, with leading spaces, and aligned the grades in columns, using extra spaces to accommodate three-digit grades.

For the read-all-lines function, this new specification means that it cannot read the lines for one record with a single read-line instruction. Instead, it must look ahead until it finds a line that starts with something else than a blank space. Since the number of continuation lines is arbitrary (zero or 100's), it is natural to design an auxiliary recursive function that accumulates the continuation lines until then:

;;  -->  (Listof (Listof String))
;; read all lines with continuation lines, split into space-separated tokens
(define (read-all-lines)
  (let loop ()
    (let ([next (read-line)])
      (if (eof-object? next)
          '() 
          (cons (regexp-split "[\\ ]+" (read-rest-of-line next)) (loop))))))

;; String  -->  String
;; given pre, read rest of a potential continuation line 
(define (read-rest-of-line pre)
  (let ([next (peek-char)])
    (cond
      [(eof-object? next) pre]
      [(eq? next #\space) (read-rest-of-line (string-append pre (read-line)))]
      [else pre])))

The two definitions show just the two changes. The first one, inside of read-all-lines, is that next is no longer handed directly to regexp-split. Instead, it is handed to the new read-rest-of-line function. This latter function employs a new primitive: peek-char (see figure 30). With this primitive, the function can look at the next character in the input device without actually reading it. Depending on the result of this look-ahead, read-rest-of-line returns the accumulated string or continues with looking at the next line.

In general, you will encounter situation when it is just best to think of a file as a stream of characters followed by an end-of-file object. For those situations, Scheme provides read-char and write-char for reading and writing characters in addition to peek-char, which just looks at the next character in the file without actually reading it. Let's just look at some examples together.

Consider the problem of just counting the number of characters in a file:

;;  -->  Number
(define (count)
  (let ([next (read-char)])
    (if (eof-object? next) 0 (+ (count) 1))))

This function reads a character at a time until it encounters the end-of-file object. At that point, it returns 0 characters, because that's how many characters are left in the file. Otherwise, it just adds 1 to the result of the recursion -- just like length, the function that counts the number of items on a list.

Another common action on files is to copy it:

;;  -->  Void
(define (copy)
  (let ([next (read-char)])
    (unless (eof-object? next)
      (write-char next)
      (copy)))) 

This function does nothing when it encounters the end of the file; otherwise it writes the character that it read and recurs.

Suppose your favorite application software produces files like this:

  { {5 , 12} 
    {3 , 4} 
    {0 , 0} }

We could trivially process the data in this file with a Scheme function if it didn't contain the commas. A single (read) would create an S-expression, and you could process the lists of numbers as usual in Scheme. Since your goal is to change all commas into spaces, you need a program that reads and inspects each character for the entire file. Clearly, this function is just another instance of the fun-for-file template, though this time we need to read and write characters not lines:

;;  -->  Void
;; to change all commas in the input stream into spaces for the output stream
(define (replace-commas)
  (let ([next (read-char)])
    (unless (eof-object? next)
      (write-char (if (eq? #\, next) #\space next))
      (replace-commas))))

The function reads characters until it finds the end of the file, and writes these characters to the standard default output stream, except for commas, which are replaced with spaces. Now you can write

(with-input-from-file "output.dat" replace-commas)

to get rid of the commas in the file "output.dat". As expected, the program produces

  { {5   12} 
    {3   4} 
    {0   0} }

on the standard output device.

Of course, this function is just an instance of a function that replaces one character with another:

;; Char Char  -->  Void
;; replace old with new while copying the input to the output
(define (replace old new)
  (let ([next (read-char)])
    (unless (eof-object? next)
      (write-char (if (eq? next old) new next))
      (replace old new))))

And this function is an instance of a function that translates an arbitrary number of characters as it copies the input file:

;; (Char  -->  Char)  -->  Void
;; copy the input to the output and translate char by char 
(define (encode translate)
  (let ([next (read-char)])
    (unless (eof-object? next) 
      (write-char (translate next))
      (encode translate))))

In particular, you can use the function to replace the braces in some file with parentheses, even if Scheme can deal with braces just fine:

(with-input-from-file "output.dat"
  (lambda ()
    (encode (lambda (x) (case x [(#\{) #\(][(#\}) #\)][else x])))))

After all, by now you should love parentheses and understand that braces are only for those who want to use the shift key a lot.


12 For the informed reader, it involves parsing and possibly parsing at the context-sensitive level.