Help Desk: XML
Collection: xml.ss (xml)
Due to the emergence of the Web and HTML, people have recently rediscovered the effectiveness of parenthesized forms of information.13 More precisely, they have developed XML, the eXtensible Markup Language. Its goal is to standardize the representation of information so that programs can easily read, write, and transmit data over the Web to other programs. XML is quickly gaining popularity, and most languages now provide at least one interface for dealing XML data.
Roughly speaking, an XML expression is a fully parenthesized form of data.14 The major, yet purely visual difference between an XML-expressionand an external S-expressionis that there are many kinds of ``parentheses'' in XML, not just parentheses, brackets, and braces. These parentheses are made up of words, called tags. For example, when we write
(1 2 3)
for the list of 1, 2, and 3, an XMLer might write
<parenthesis>1 2 3</parenthesis>
<paren>1 2 3</paren>
or something similar. The tokens
</paren> are called
the start tag and the end tag, respectively. If you think of
them as parentheses, you'll do fine most of the time.
You can use almost any sequence of characters into a pair XML of start and end tags. The pair of tags and everything in between is called an element. The sequence of characters for the tag is the name of the element. The rest is called the contents. In addition to name and content, an XML expression may also have attributes. For example,
<paren title="nat nums" date="oct 22, 2000"> 1 2 3 </paren>
has two attributes: title and date. The values of the
attributes are the two strings
"nat nums" and
"oct 22, 2000".
(Comp210 "Fall 2001" (Adam 78 88 69) (Brad 88 87 86) (Cath 99 88 88) (Dave 77 78 77) (Fawn 90 89 81) (Gege 67 78 81))
<course title="Comp210" semester="Fall 2001"> <student name="Adam"> <g>78</g><g>88</g><g>69</g> </student> <student name="Brad"> <g>88</g><g>87</g><g>86</g> </student> <student name="Cath"> <g>99</g><g>88</g><g>88</g> </student> <student name="Dave"> <g>77</g><g>78</g><g>77</g> </student> <student name="Fawn"> <g>90</g><g>89</g><g>81</g> </student> <student name="Gege"> <g>67</g><g>78</g><g>81</g> </student> </course>
Figure 36: An XML representation of a grade file
Figure 36 shows one of many ways of representating an S-expression for tracking grades with XML. A comparison shows how an XML data designer might use attributes. For example, the course title and the course semester are attributes of the <course> parentheses. Similarly, the name of the student is an attribute of the <student> element. Each grade is surrounded by an additional pair of parentheses to set it visually apart from others.
Clearly XML is a generalization of the data language of S-expressions. The parentheses are named; each parenthesized element may have attributes. At the same time, it is also clear that we can naturally represent XML with S-expressions. Indeed, there are many different ways of translating XML expressions into S-expressions.
The data definition in figure 37 sketches PLT Scheme's choice of mapping XML into S-expressions. We refer to this subset of S-expressions as X-expressions. The figure also specifies a collection of functions that allows us to read XML, to convert XML into X-expressions, and to print X-expressions.
Xexpr ::= String |
(cons Symbol (cons (Listof (list Symbol String)) (Listof Xexpr)))
an element |
(cons Symbol (Listof Xexpr))
an element without attributes | Symbol a symbolic entity such as | Number a numeric entity such as  | ... see Help Desk XML ::= a structure
read-xml/element : --> XML;; to read a single XML expression from standard input ;;
xml-->xexpr : XML --> Xexpr;; to convert an XML element into an X-expression ;;
xexpr-->xml : Xexpr --> XML;; to convert an X-expression into an XML element ;;
write-xml/content : XML --> Void;; to write XML to the standard output ;;
display-xml/content : XML --> Void;; to pretty-display XML to the standard output ;;
eliminate-whitespace : (Listof Symbol) (Boolean --> Boolean) --> XML --> XML;; to eliminate whitespaces from XML elements that contain XML elements
Figure 37: Reading XML and X-expressions
With the functions in figure 37 we can read XML expressions from
files almost as easily as S-expressions. Reading an XML expression yields a
document from which we extract the element. It is this element that
xml-->xexpr then converts into an X-expression. Consider the example in
figure 38. The left column is the textual representation of
an XML expression. Assume this text is stored in a file called
"sample.xml". Then the evaluation of the expression
(xml-->xexpr (with-input-from-file "sample.xml" read-xml/element))
yields the X-expression in the right column of figure 38.
<course title="Comp210"> <student name="Adam"> <g>88</g> </student> <student name="Beth"> <g>96</g> </student> <student name="Cath"> <g>70</g> </student> <student name="Dave"> <g>68</g> </student> <student name="Fawn"> <g>99</g> </student> <student name="Gege"> <g>100</g> </student> </course>
(course ((title "Comp210")) " " (grades ((name "Adam")) " " (g () "88") " ") " " (grades ((name "Beth")) " " (g () "96") " ") " " (grades ((name "Cath")) " " (g () "70") " ") " " (grades ((name "Dave")) " " (g () "68") " ") " " (grades ((name "Fawn")) " " (g () "99") " ") " " (grades ((name "Gege")) " " (g () "100") " ") " ")
Figure 38: Reading XML: a first example
Figure 38 shows that
whitespaces (blanks, tabs, newlines) in the file and turns them into
strings. Although this whitespace preservation is important for
text-processing within XML elements, it is a nuisance for other
applications. This X-expression is clearly not what we want; it contains
every whitespace that the file contains as an additional string.
Xexpr --> Record;; convert an XML record for a course into an S-expression for
gpas(define (xexpr->record x) (map (lambda (per-student) (cons (student-name per-student) (map grade-number (student-grades per-student)))) (course-students x))) ;; selectors and conversions (define course-students cddr) (define (student-name s) (cadar (cadr s))) (define (student-grades s) (cddr s)) (define (grade-number g) ( string--> number (caddr g)))
Figure 39: Converting XML to an S-expression
String --> Xexpr(define (record->xexpr title r) `(course ((title ,title)) ,@(map student->xexpr r))) ;;
(cons String (Listof Number)) --> Xexpr(define (student->xexpr s) `(student ((name ,(car s))) ,@(map number->grade (cdr s)))) ;;
Number --> Xexpr(define (number->grade x) `(g () ,( number--> string x))) (display-xml/content (xexpr-->xml (record->xexpr "Comp210" xx)))
Figure 40: Printing an S-expression as XML
We can eliminate (most of) these useless whitespaces with
eliminate-whitespace from the XML library:
(xml-->xexpr ((eliminate-whitespace '(course grades student) identity) (with-input-from-file "sample.xml" read-xml/element)))
It produces the following output, which is close to what we want:
(course ((title "Comp210")) (grades ((name "Adam")) (g () "88")) (grades ((name "Beth")) (g () "96")) (grades ((name "Cath")) (g () "70")) (grades ((name "Dave")) (g () "68")) (grades ((name "Fawn")) (g () "99")) (grades ((name "Gege")) (g () "100")))
From here, we can process the grade list with plain old Scheme functions; see
figure 39 for a small program that translates from this
Xexpr to an S-expression that is a legal input for the
program in figure 34.
eliminate-whitespace consumes a list of XML tags (symbols)
and a function; for now we just use
x) for this second argument. The result is a function that traverses an XML
element and that systematically eliminates whitespaces from those elements
whose tags are included in the given list. Of course, the function cannot
eliminate whitespace from elements that must contain text.
Your Scheme programs can also print XML as easily as they can read it. Say you need to print an S-expression. The process consists of two steps. First, the program must translate the S-expressioninto an appropriate X-expression. Second, the program can then use the primitives from the XML library to create and print a true piece of XML data.
Figure 40 shows how this all works for our running example. The
first step is accomplished with a collection of conventional Scheme functions:
one for creating class rosters from the title of the course and the actual
roster; another one for translating each item in the roster into
format; and a third for creating a grade
Xexpr from a number. The
expression at the bottom of the figure performs the step-by-step translation
and printing process.
Matthias has gotten here.