January 12, 2003
Computers can represent a variety of information: text, sound, color, speed of an airplane, the direction of the wind, the prices of shares on the stock market, the results of a pre-election poll, the location of a GPS receiver. The information is represented in a computer as some data. Data typically does not represent all available information about some object or phenomenon. It represents only that part of information that is relevant to the problem which the computer program is designed to solve.
The designer of a computer program needs to analyze the problem, decide what is the information relevant to the problem, decide how to represent this information as data. Once this is done, the programmer designs the actual computation that transforms the given data and produces a result. The result data represents the desired outcome information. A programmer must follow a strict discipline to assure that the program works correctly and performs the desired tasks.
What is the difference between information and data? When counting the number of pieces of fruit in a fruit basket, one only needs to know the total count - a number. So, number 7 may represent three oranges, two bananas, and two apples. However, if someone asks whether there is a banana in the fruit basket, number 7 cannot represent the information we seek. If we expect this type of question we may encode the contents of the fruit basket as a sequence of pairs: (3 orange) (2 banana) (2 apple). This representation of information allows us to ask more questions about the fruit basket. Still, we do not know what kind of apples are in the basket (Macintosh, Cortland, Golden Delicious, or other), nor do we know how much they weigh, or what is their price.
The simplest kind of information can be represented by one numeric quantity, one sequence of letters, or some symbol. The corresponding data similarly is encoded as one number or one String (sequence of characters). But as the earlier example shows, in most cases the information about some phenomenon must be represented as several related data items. A class of data represents information about some phenomenon. It consists of a collection of data items, each representing one aspect of the phenomenon. So, for example, the daily weather report for Mt. Washington specifies the temperature (-5186F), the wind direction and velocity (NW, 67mph), the cloud cover and visibility (fog, 0), the precipitation for the day (1.0in). It is clear that each piece of data has a meaning of its own, but also that the whole collection of data items represents one complex data item - a class of data.
The programmer needs to identify the relevant classes of data and describe the relationships between them. For example, we may have a class of Person and a related class of Employee. Every employee is a person, but the Employee data contains more information than the Person data. A company may keep a List of Employees, and an Organizational Chart of Employees - with the CEO at the top. The figure xyz represents the relationships between these classes of data.
Computer programs perform computations. The goal of these computations is to consume one kind of data, perform some computation, and produce the result as some new data, which in turn represents new kind of information. For example, if we compute the sum of all fruits in the fruit basket, the result is a number that represents the quantity of fruits. The program that determines whether there is a banana in the fruit basket consumes the data representing the fruit basket and produces a symbol 'true' or 'false' that represents the answer.
The program design consists of a series of steps described in a design recipe. Following the design recipe helps the programmer to understand the nature of the problem, understand the structure of the data and the corresponding structure of the program, and to identify errors in the design or implementation of the program. The smallest programs represent simple computations, such as computing the speed, given the distance and time, or computing the average of two numbers. We start with the design recipes for such simple computations - implemented as functions. The design recipe consists of five steps:
Data Analysis and Data Definition
Purpose and Header
Examples
Body
Testing