In 1974 I started a company, Kurzweil Computer Products, Inc. ("KCPI") to pursue my interest in pattern recognition, part of the broader field of "artificial intelligence." In the field of pattern recognition, we teach computers to recognize patterns, such as printed shapes, human faces, speech sounds, land terrain maps (for cruise missiles) and other examples of real-world phenomena. The other fields of artificial intelligence are devoted to capturing human reasoning faculties (e.g., chess playing computers, programs that make financial investment decisions, etc.). It turns out that 90% of the human brain is devoted to interpreting and understanding patterns, and solving these problems are critical to capturing intelligence in a machine.
We attacked what at that time was regarded as a classical (and unsolved) problem in pattern recognition which was teaching a computer to identify printed characters regardless of the type font they were printed in, the size of print, quality of print and other characteristics. Computer systems existed that could recognize printed letters if they were printed in a special type font (e.g., Courier or OCR A), but there were no systems that could recognize printed letters regardless of their type face. Solving this problem required us to teach the computer how to abstract the essential qualities of the concepts behind each letter. There are hundreds of different shapes we all call "A", but it is not immediately clear what essential invariant properties distinguish all A's from all other letters.
We came up with an effective approach to this problem. The question then became: what is this technology good for? It was a solution in search of a problem.
We did some market research and quickly came upon the problem of accessing ordinary print by blind and visually impaired persons. Braille was (and continues to be) a vitally important medium which provides full literacy to blind persons as a system for both reading and writing. Recorded materials (i.e., "talking books") also provide access to the world of literature. But both methods suffered from a limitation: the range of available material was limited. Of the 50,000 new books published each year, only 3 percent were translated into Braille, and only about 5 percent were available as Talking Books. The availability of topical literature such as inter-office memos was even more limited.
It quickly became clear that a print-to-speech reading machine could overcome this handicap associated with the disability of visual impairment. It would provide another important tool along with Braille to enable blind persons to compete fully with their sighted peers.
There were several other key technology hurdles that we needed to face in order to create the world's first print-to-speech reading machine. Back in 1974, there were no CCD (Charge Coupled Device) flat bed scanners. There were no text-to-speech speech synthesizers. So we needed to create these technologies as well. The three technologies we created - (i) omni-font (i.e., "any" font) OCR (optical character recognition), (ii) CCD flat bed scanners and (iii) text-to-speech speech synthesizers ultimately evolved into what are today large industries.
A vital issue in creating the world's first print-to-speech reading machine was to gain an understanding of how to organize these resources in a machine and how to connect the user to these capabilities in an intuitive fashion. Another important issue of course was funding. As it turns out the National Federation of the Blind ("NFB") played a critical role in helping our small organization in both of these areas.
We presented our ideas and plans to many people and organizations back in 1975. Most people told us that our ideas were interesting and ambitious and to keep in touch. But two people were particularly responsive and promised to help us to achieve our goals. One was Jim Gashel, who was then (and still is) the head of the Washington office of the National Federation of the Blind. Jim was one of the most energetic people I've ever met, and he wanted to work closely with us as part of the project. The other was Jim's boss, Dr. Kenneth Jernigan, President of the NFB.
As an aside, when I was growing up in Queens, New York in the late 1950s and early 1960s, I belonged to a religious youth organization called LRY for Liberal Religious Youth, organized by the Unitarian Church. We were early participants in the civil rights movement and took part in some of the early civil rights marches and demonstrations. I considered myself fortunate to have the opportunity to participate in this important phase of American history and was always inspired by Dr. Martin Luther King's great oratory and leadership.
In my work with the National Federation of the Blind in the 1970s I came to feel the same way about Dr. Jernigan. At many of the NFB conventions I attended, Dr. Jernigan reminded me of Dr. King, particularly during Dr. Jernigan's inspiring keynote addresses. With his leadership, the NFB has been in the forefront of another great effort to provide equal opportunity for all Americans.
But I digress. I had the opportunity in 1975 to review my plans for the world's first print-to-speech reading machine with Dr. Jernigan and Jim Gashel. They agreed to work with me to help find funding for this effort if I agreed to involve the NFB, and in particular its blind engineers and scientists, in the design of the reading machine, its user interface and controls, and help to evaluate and refine all aspects of its operation and functions. I wasn't really expecting that request, but I was in no position to argue, so I said "sure, why not."
As it turned out, this collaboration between Kurzweil Computer Products, Inc. and the National Federation of the Blind worked extremely well and in effect killed two birds with one stone. We were successful in raising about $350,000 in funding from a number of foundations. I had the opportunity to get to know Jim Gashel quite well in this process as the two of us worked jointly on these proposals in his Washington office, often late into the night.
The joint KCP - NFB Program also played the key role in creating an effective reading machine from its constituent technologies. It is clear to me that the Kurzweil Reading Machine would not have been an effective tool if it was not for the key insights into its design that were contributed by the NFB scientists and engineers. In fact the design came out quite differently than we had originally expected, and as it turns out, was very well accepted by blind consumers. With the intended users having been intimately involved in every stage of the design process, it anticipated the user's needs in ways that we as well-intentioned by sighted engineers could never have anticipated.
Many of the key ideas that were created in this KCP - NFB collaboration still form the basis of the user interface of all print-to-speech reading machines for the blind today. I'll provide one instructive example. We were going to put little Braille labels on all of the user controls so that a new user would know which control was which. One of the NFB engineers said that it would be very annoying to feel these Braille labels hundreds of times a day, every day. So I asked him could a new user identify the controls without Braille labels? He suggested putting another prominent button on the panel, which he called the "nominator" key, and if a user wanted to identify a control, he would simply push the nominator key, then hit another key, and that second key would announce its name and describe its function. Then, after using the nominator key to explore the keyboard for a few days, a user would know where all the keys were, and would not need to feel these annoying Braille labels hundreds of times every day.
Well, that made sense when we heard it, but since we were not the intended users of the invention, it is an insight that we never would have realized on our own.
This is a lesson I have carried to other projects I have been involved in subsequently. With my music company, we required all of the engineers to be musicians, because there was no other way to be sensitive to the nuances of sound and the subtle interactions of feel and response in a musical instrument. In creating voice activated medical reporting systems, I worked very closely with physicians.
This collaboration was sufficiently successful that we were able to attract millions of dollars of funding from what is now the Department of Education and other Government agencies.
We announced the Kurzweil Reading Machine at a press conference on January 13, 1976. It seemed to strike a chord and was featured on the national evening news of all three networks. Walter Cronkite used the machine for his signature sign-off by having it read "And that's the way it was, January 13, 1976." Incidentally, sitting at the controls that day was Jim Gashel.
At a subsequent live television demonstration of the reading machine, we were a little nervous because we had only one working prototype at the time. And as one might expect, the machine stopped working just a couple of hours before we were supposed to go on the air. Finally, in frustration, our Chief Engineer just lifted the scanner and banged it on the table. This time honored approach to fixing delicate electronic equipment seemed to work, and the reading machine started reading again. The presentation then went quite smoothly.
In 1978, we introduced a version of the reading system for commercial applications such as word processing and entering data into data bases called the Kurzweil Data Entry Machine ("KDEM"). The KDEM was very successful and this attracted the interest of Xerox Corporation which saw the technology as a bridge back from the world of paper to the world of electronics. Most of Xerox's products could create paper documents from either electronic documents or other paper documents. Our KDEM technology allowed you to go in the other direction, from a paper document back to an electronic document. Xerox invested in the Company in 1978. In 1980, I sold them the Company.
I remained Chief Executive Officer of KCPI as a Xerox subsidiary until 1982. At that time I started two new companies. One was Kurzweil Music Systems, Inc. which created the first computer based musical instrument that could recreate the sounds of the grand piano and other orchestral instruments. I sold that Company to Young Chang, a large Korean musical instrument manufacturer, in 1990. The other was Kurzweil Applied Intelligence, Inc. ("KAI") which created the first commercially marketed large vocabulary speech recognition technology. A primary application of that technology is to enable hands impaired individuals to use computers, communicate and control their environment. A future goal is to create the opposite of a reading machine - a device which will convert speech into print - so that a deaf person can understand what people are saying. KAI continues as a public company and I have continued to be its Chief Technology Officer.
I also continued as a consultant to Kurzweil Computer Products, Inc. (which changed its name to Xerox Imaging Systems around 1990) from 1982 until 1995. This gave me the opportunity to continue to learn and gain insight into reading machine design and the many technical and user interface issues that arise. It also afforded me the opportunity to continue my relationship with many people in this field, and in particular with Dr. Jernigan and the NFB.
Now in 1996 I have started another company, Kurzweil Educational Systems, Inc. to create a new generation of reading technology. I am Chairman of KESI. Mike Sokol, who headed up sales for XIS' Adaptive Products Division for ten years is President. Two individuals familiar to many of you from their years of service in this industry are involved: David Bradburn heads up marketing and Forrest Dobbs heads up sales.
There were several reasons that I started this new company. One was that my twenty-three year involvement in this field has been perhaps the most gratifying in my professional career.
Another is that in these twenty-three years I have gathered insight into the many subtle issues of how to design an effective reading machine that I wanted an opportunity to use and express.
Thirdly, I felt that the enabling technologies of personal computers and scanners had evolved to the point where great advances were again possible in this field.
Finally, and perhaps most importantly, many people asked me to come back to this field to use my experience to again make a direct contribution.
So that is what I hope to achieve with KESI. I have had the opportunity to gather together some of the best minds in this field in both technology and marketing, and we have introduced our first product called OMNI 1000 which represents a new generation of print-to-speech reading machines for the blind. The design of this product was guided by several key principles:
Provide the highest possible level of OCR (optical character recognition) accuracy. The quality of a reading machine can never be better than its OCR. Having developed the first omni-font OCR twenty years ago, I and the KESI technology team were able to use these insights to provide a highly accurate OCR technology that combines image enhancement software, basic OCR and lexical post processing. In addition to character accuracy, the ability of the software to understand complex page formats is also very important.
Provide high quality speech synthesis that is natural sounding and easy to understand. OMNI 1000 uses a new speech synthesizer called FlexTalk from AT&T developed at Bell Labs (now called Lucent). In addition to natural sounding synthetic speech, the system analyzes and parses the structure of each sentence to provide a natural sounding phrasing, cadence and prosodic contour.
Provide an intuitive user interface that is easy to learn and use. Here again we have and are continuing to work with our users, as well as benefiting from our own experience over the last twenty years. Heading up our user interface design for OMNI 100 is Steve Baum, who was Chief Scientist at KCPI / Xerox Imaging Systems for ten years.
Provide a rich array of features such as immediately available dictionary definitions, voice commands, voice prompts, document management, multiple reading "personalities," a voice calculator and many others.
Take advantage of the outstanding price - performance of commodity computing components. Today's personal computers and scanners provide tremendous capability at very low prices. As soon as you start designing specialized hardware for a disabled population, you lose the price - performance benefits of commodity components. Take for example the issue of a book edge scanner. Commodity scanners do not generally provide a book edge. Building a special scanner with a book edge is very expensive and locks the designer into an older generation of components. We solved this problem by using a large platten scanner (8.5 inches by 14 inches) and special software. The user can now simply place both exposed pages of a book on the scanner, and the software will (i) automatically recognize the orientation of the book, (ii) compensate for the curvature near the book spine, (iii) eliminate the dark ragged image that the scanner picks up in-between the two open pages and (iv) accurately read the two pages. This turns out to be even easier for the user to use than a special book edge scanner because now the user can scan two pages at once.
We are also introducing two other versions of OMNI. OMNI 2000 is intended for low vision individuals who would otherwise use CCTV (Closed Circuit TeleVision) enlargement systems. OMNI 2000 enlarges print just like a CCTV system, but it does some things that a CCTV is unable to do, including reading the print out loud, highlighting on the image the words that are currently being read, automatically moving the image so that the user does not need to move the book on an X-Y mover, providing on-line dictionary definitions, voice commands, and many other features. This shows you the power of computers because the OMNI 2000 and contemporary CCTV systems are in the same general price range.
OMNI 3000 is intended for individuals with dyslexia and / or learning disabilities, i.e., people who have difficulty reading for reasons other than visual impairment. The OMNI 3000 enlarges print on a screen like OMNI 2000. It preserves the look and feel of the image of the page as it appears with all of the formatting, graphics and color images displayed. It reads from the enlarged image of the actual page and highlights what it is reading. Based on this foundation, OMNI 3000 provides instructional software to help dyslexic students learn to read and to overcome their reading disabilities.
Let me return to the issue of taking advantage of the price - performance of commodity computing components as I believe this issue deserves additional discussion. We are witnessing today a true revolution that is having a profound impact on all facets of society. The information age is an extraordinary and in my view permanent shift to knowledge, to intellectual property, to software as the foundation of wealth and power in what I like to call the second industrial revolution.
The phenomenon that is fueling the information age is something called "Moore's Law" which states that computing speeds and densities double every eighteen months. In other words, every eighteen months we can buy a computer that is twice as fast and has twice as much memory for the same cost.
Moore's law actually is corollary of a broader law I like to call Kurzweil's law on the exponentially quickening pace of technology that goes back to the dawn of human history, Not much happened in, say, the tenth century, technologically speaking. In the eighteenth century, quite a bit happened. Now we have major paradigm shifts in a few years time?.but that's another article.
But with regard to Moore's law, remarkably, this law has held true since the beginning of this century, from the mechanical card-based computing technology of the 1890 census, to the relay-based computers of the 1940s, to the vacuum tube-based computers of the 1950s, to the transistor-based machines of the 1960s, to all of the generations of integrated circuits that we've seen over the past three decades.
If you put every calculator and computer for the past 100 years on a logarithmic chart, it makes an essentially straight line. Computer memory, for example, is about 16,000 times more powerful today for the same unit cost as it was about 20 years ago. Computer memory is 150 million times more powerful for the same unit cost than it was in 1948, the year I was born. If the automobile industry had made as much progress in the past forty-eight years, a car today would cost about a hundredth of a cent, and would go faster than the speed of light.
Moore's law will continue unabated for many decades to come. We have not even begun to explore the third dimension in chip design. Chips today are flat, whereas our brain is organized in three dimensions. We live in a three dimensional world, why not use the third dimension?
Improvements in semiconductor materials, including the development of superconducting circuits that do not generate heat, will enable the development of chips, or I should say cubes, with thousands of layers of circuitry, which when combined with far smaller component geometries, will improve computing power by a factor of many millions. There are more than enough new computing technologies being developed to assure a continuation of Moore's law for a very long time.
Reading machines for the blind have certainly benefited from Moore's law. I examined this issue recently with regard to reading machines.
Let's compare the first reading machine, the Kurzweil Reading Machine, which I introduced in 1976, to the OMNI 1000, which is the new reading machine that Kurzweil Educational Systems, Inc. has just introduced.
The 1976 Kurzweil Reading Machine had 64 thousand bytes of memory. The 1996 OMNI 1000 has 16 million bytes of memory. So that's a ratio of 256 to 1.
The 1976 KRM used a cassette tape for mass storage. The 1996 OMNI 1000 has a billion byte hard drive and a half billion byte CDROM drive.
The 1976 KRM had a processor speed of a quarter of a million instructions per second, or a quarter MIP. The 1996 OMNI 1000 uses a Pentium 100 which provides 100 MIPs. So that's a ratio of 400 to 1.
If we compare overall performance of the computer and scanner, of the optical character recognition, voice quality and other features and characteristics, I think it is fair overall to say that the 1996 product provides about 256 times the performance as compared to the 1976 product.
Okay, now the price of the first Kurzweil Reading Machine was around $67,000. The price of the OMNI 1000 is around $4,000. So that's a ratio of 16.75 to 1. When we take inflation into consideration, that's actually a ratio of about 42 to 1 in constant dollars.
So we have a product that has 256 times the memory, 400 times the computing speed, and 256 times the overall performance for a price that is 42 times less. So that's an overall improvement in price-performance of 10,752 to 1.
But before we congratulate ourselves, let's see what Moore's Law would have predicted. There have been 13 turns of Moore's screw since 1976. That is, Moore's Law predicts that we should have doubled the price-performance of computer-based devices 13 times since 1976. Well 2 to the 13th power is 8,192. So we should have improved price-performance by a factor of 8,192. In actuality, the analysis I just went through shows that we have improved it by a factor of 10,752. So we've done a little better than Moore's Law. What is remarkable to me is that when you do comparisons of this kind, Moore's Law is remarkably accurate in making these kinds of predictions.
And, of course, Moore's law will continue to improve all aspects of reading machine price and performance in the years ahead. Just recently, two-dimensional scanning chips have emerged which can scan a full page of text with 300 spot per inch resolution without any moving parts. These two-dimensional scanning arrays, which have over 5 million pixels, are prototypes and are, therefore, expensive. But within a few years, these chips will permit the development of pocket sized scanners, the size of a small camera that can snap a full page instantly. Thus around the end of this decade, a full print-to-speech reading machine will fit in your pocket. You'll hold it over the page to be scanned, and snap a picture of the page. All of the electronics and computation will be inside this small camera-sized device. You'll then listen to the text being read from a small speaker or earphone.
You will also be able to snap a picture and read a poster on a wall, or a street sign, or a soup can, or someone's ID badge, or an appliance LCD display and many other examples of real-world text. This reading machine will cost less than thousand dollars and will ultimately come down to hundreds of dollars. Algorithmic improvements will also provide capabilities to describe non-textual material such as graphs and diagrams, and page layouts. These devices will also provide on-line access to knowledge bases and libraries through the information superhighway By the end of the first decade of the next century, the intelligence of these devices will be sufficient to provide reasonable descriptions of pictures and real-world scenes. These devices will also be capable of translating from one language to another.
It is my sincere hope and personal goal that KESI will provide the technological leadership to create these future generations of reading machines. But for now, we are proud of the OMNI 1000 that we just introduced, proud of the excellent team that we've put together, and thrilled to be working again with the National Federation of the Blind and its many devoted and talented members.