3  Input and Output: The XML Way

Help Desk: XML

Collection: xml.ss (xml)



Due to the emergence of the Web and HTML, people have recently rediscovered the effectiveness of parenthesized forms of information.13 More precisely, they have developed XML, the eXtensible Markup Language. Its goal is to standardize the representation of information so that programs can easily read, write, and transmit data over the Web to other programs. XML is quickly gaining popularity, and most languages now provide at least one interface for dealing XML data.

Roughly speaking, an XML expression is a fully parenthesized form of data.14 The major, yet purely visual difference between an XML-expressionand an external S-expressionis that there are many kinds of ``parentheses'' in XML, not just parentheses, brackets, and braces. These parentheses are made up of words, called tags. For example, when we write

  (1 2 3)

for the list of 1, 2, and 3, an XMLer might write

  <parenthesis>1 2 3</parenthesis>

or

  <paren>1 2 3</paren>

or something similar. The tokens <paren> and </paren> are called the start tag and the end tag, respectively. If you think of them as parentheses, you'll do fine most of the time.

You can use almost any sequence of characters into a pair XML of start and end tags. The pair of tags and everything in between is called an element. The sequence of characters for the tag is the name of the element. The rest is called the contents. In addition to name and content, an XML expression may also have attributes. For example,

  <paren title="nat nums" date="oct 22, 2000">
  1 2 3
  </paren>

has two attributes: title and date. The values of the attributes are the two strings "nat nums" and "oct 22, 2000".


(Comp210 "Fall 2001"
  (Adam 78 88 69)
  (Brad 88 87 86)
  (Cath 99 88 88)
  (Dave 77 78 77)
  (Fawn 90 89 81)
  (Gege 67 78 81))

  

        
<course title="Comp210" 
        semester="Fall 2001">
 <student name="Adam">
  <g>78</g><g>88</g><g>69</g>
 </student>
 <student name="Brad">
  <g>88</g><g>87</g><g>86</g>
 </student>
 <student name="Cath">
  <g>99</g><g>88</g><g>88</g>
 </student>
 <student name="Dave">
  <g>77</g><g>78</g><g>77</g>
 </student>
 <student name="Fawn">
  <g>90</g><g>89</g><g>81</g>
 </student>
 <student name="Gege">
  <g>67</g><g>78</g><g>81</g>
 </student>
</course>

Figure 36:  An XML representation of a grade file

Figure 36 shows one of many ways of representating an S-expression for tracking grades with XML. A comparison shows how an XML data designer might use attributes. For example, the course title and the course semester are attributes of the <course> parentheses. Similarly, the name of the student is an attribute of the <student> element. Each grade is surrounded by an additional pair of parentheses to set it visually apart from others.

Clearly XML is a generalization of the data language of S-expressions. The parentheses are named; each parenthesized element may have attributes. At the same time, it is also clear that we can naturally represent XML with S-expressions. Indeed, there are many different ways of translating XML expressions into S-expressions.

The data definition in figure 37 sketches PLT Scheme's choice of mapping XML into S-expressions. We refer to this subset of S-expressions as X-expressions. The figure also specifies a collection of functions that allows us to read XML, to convert XML into X-expressions, and to print X-expressions.

Xexpr ::= String
| (cons Symbol (cons (Listof (list Symbol String)) (Listof Xexpr)))
an element
| (cons Symbol (Listof Xexpr)) an element without attributes
| Symbol a symbolic entity such as &nbsp;
| Number a numeric entity such as &#20;
| ...see Help Desk
 
XML ::= a structure

;; read-xml/element :  -->  XML
;; to read a single XML expression from standard input

;; xml-->xexpr : XML  -->  Xexpr
;; to convert an XML element into an X-expression

;; xexpr-->xml : Xexpr  -->  XML
;; to convert an X-expression into an XML element

;; write-xml/content : XML  -->  Void
;; to write XML to the standard output

;; display-xml/content : XML  -->  Void
;; to pretty-display XML to the standard output

;; eliminate-whitespace : (Listof Symbol) (Boolean  -->  Boolean)  -->  XML  -->  XML
;; to eliminate whitespaces from XML elements that contain XML elements 

Figure 37:  Reading XML and X-expressions

With the functions in figure 37 we can read XML expressions from files almost as easily as S-expressions. Reading an XML expression yields a document from which we extract the element. It is this element that xml-->xexpr then converts into an X-expression. Consider the example in figure 38. The left column is the textual representation of an XML expression. Assume this text is stored in a file called "sample.xml". Then the evaluation of the expression

(xml-->xexpr
  (with-input-from-file "sample.xml" read-xml/element))

yields the X-expression in the right column of figure 38.

<course title="Comp210">
  <student name="Adam">
    <g>88</g>
  </student>
  <student name="Beth">
    <g>96</g>
  </student>
  <student name="Cath">
    <g>70</g>
  </student> 
  <student name="Dave">
    <g>68</g>
  </student>
  <student name="Fawn">
    <g>99</g>
  </student>
  <student name="Gege">
    <g>100</g>
  </student>
</course>

        
(course ((title "Comp210")) "
  " (grades ((name "Adam")) "
    " (g () "88") "
  ") "
  " (grades ((name "Beth")) "
    " (g () "96") "
  ") "
  " (grades ((name "Cath")) "
    " (g () "70") "
  ") " 
  " (grades ((name "Dave")) "
    " (g () "68") "
  ") "
  " (grades ((name "Fawn")) "
    " (g () "99") "
  ") "
  " (grades ((name "Gege")) "
    " (g () "100") "
  ") "
")

Figure 38:  Reading XML: a first example

Figure 38 shows that read-xml preserves whitespaces (blanks, tabs, newlines) in the file and turns them into strings. Although this whitespace preservation is important for text-processing within XML elements, it is a nuisance for other applications. This X-expression is clearly not what we want; it contains every whitespace that the file contains as an additional string.

;; Xexpr  -->  Record
;; convert an XML record for a course into an S-expression for gpas
(define (xexpr->record x)
  (map (lambda (per-student) 
         (cons (student-name per-student)
               (map grade-number (student-grades per-student))))
       (course-students x)))

;; selectors and conversions 
(define course-students cddr)
(define (student-name s) (cadar (cadr s)))
(define (student-grades s) (cddr s))
(define (grade-number g) ( string--> number (caddr g)))

Figure 39:  Converting XML to an S-expression

;; String  -->  Xexpr
(define (record->xexpr title r)
  `(course ((title ,title)) ,@(map student->xexpr r)))

;; (cons String (Listof Number))  -->  Xexpr
(define (student->xexpr s)
  `(student ((name ,(car s))) ,@(map number->grade (cdr s))))

;; Number  -->  Xexpr
(define (number->grade x) `(g () ,( number--> string x)))

(display-xml/content (xexpr-->xml (record->xexpr "Comp210" xx)))

Figure 40:  Printing an S-expression as XML

We can eliminate (most of) these useless whitespaces with eliminate-whitespace from the XML library:

(xml-->xexpr
  ((eliminate-whitespace '(course grades student) identity)
   (with-input-from-file "sample.xml" read-xml/element)))

It produces the following output, which is close to what we want:

(course ((title "Comp210"))
  (grades ((name "Adam")) (g () "88"))
  (grades ((name "Beth")) (g () "96"))
  (grades ((name "Cath")) (g () "70"))
  (grades ((name "Dave")) (g () "68"))
  (grades ((name "Fawn")) (g () "99"))
  (grades ((name "Gege")) (g () "100")))

From here, we can process the grade list with plain old Scheme functions; see figure 39 for a small program that translates from this Xexpr to an S-expression that is a legal input for the gpas program in figure 34.

Note: eliminate-whitespace consumes a list of XML tags (symbols) and a function; for now we just use identity or (lambda (x) x) for this second argument. The result is a function that traverses an XML element and that systematically eliminates whitespaces from those elements whose tags are included in the given list. Of course, the function cannot eliminate whitespace from elements that must contain text. 

Your Scheme programs can also print XML as easily as they can read it. Say you need to print an S-expression. The process consists of two steps. First, the program must translate the S-expressioninto an appropriate X-expression. Second, the program can then use the primitives from the XML library to create and print a true piece of XML data.

Figure 40 shows how this all works for our running example. The first step is accomplished with a collection of conventional Scheme functions: one for creating class rosters from the title of the course and the actual roster; another one for translating each item in the roster into Xexpr format; and a third for creating a grade Xexpr from a number. The expression at the bottom of the figure performs the step-by-step translation and printing process.


Matthias has gotten here.


13 Hey, it took fewer than 50 years to recognize that LISP had it right in the first place. Let's not complain.

14 There is more to XML than parentheses, but we ignore this for now.