Neural Networks

Neural Networks: An Overview


NOTE: This page is still under construction!
C.L. - Wednesday 2/28/96

Chad Loder
Honors Computer Science Seminar
College of Computer Science
Northeastern University


Table of Contents:


Conventions and Terminology

This paper is my project for COM 1722, Honors Computer Science Seminar. I'm a freshman at Northeastern University, attempting a double major in Computer Science and Mathematics with a minor in Behavioral Neuroscience. I think this project is a fascinating synthesis of knowledge from these fields.

In this paper, I'm going to disregard current terminology and use the term neural network to refer to a computational model of a biological neural network, instead of the more correct artificial neural network (ANN). The latter term is supposed to avoid confusion as to whether I'm talking about computers or real brains, but I think the context will make my usage clear.
I've drawn all the non-photographic pictures in various ways and converted them to GIF format. Every picture is hyperlinked to either an explanation of how I drew it or to a reference of where I got the picture from. Just click on the picture to get more information on it. This page looks best when viewed with Netscape, but it also looks fine on other browsers like Mosaic, and text browsers like Lynx. If you're using Netscape, to make things even prettier, try setting your font to 12-point Palatino and widening your browser window a little.
One final note. This page is obviously not purely original work. A lot of the thinking is mine, but most of the factual data has come from the various books that I've read over the last year or so. To quote my sources for every date and name I mention would make this paper a complete mess, so I've decided not to use a quoting system except for direct quotes. The sections on artificial intelligence and the human brain are 100% mine; I'm not claiming to have intellectual ownership of the ideas, but the opinions and the explanations of the concepts are my own and I take responsibility for them.


Introduction

Origins of Neural Networks

In the early 1940s, Warren McCulloch and Walter Pitts published a seminal paper titled "A Logical Calculus of the Ideas Immanent in Nervous Activity". In it, they proposed a mathematical model of a neuron, which could perform computations. This artificial neuron, or neurode (some call them neurones), was a simple device which could receive input from other such devices.

The neurode's output was either a 1 or a 0, reflecting the all-or-none theory of biological neurons. When the total input reached a certain critical level, the neurode would send its output to other neurodes with which it was connected. This method is called threshold logic.
In basic propositional logic, something can be either true or false. Since a neurode's state is either a 1 or a 0, it can be represented by a proposition. If you organize simple neurodes into a network, they can combine to form more complex propositions. This theory was so influential that this type of neurode is called the McCulloch- Pitts neuron. Some modern neural networks use neurodes which are essentially extensions of the McCulloch-Pitts neuron.
"Because of the "all-or-none" character of nervous activity, neural events and the relations among them can be treated by means of propositional logic. It is found that the behavior of every net can be described in these terms, with the addition of more complicated logical means for nets containing circles; and that for any logical expression satisfying certain conditions, one can find a net behaving in the fashion it describes. It is shown that many particular choices among possible neurophysiological assumptions are equivalent, in the sense that for every net behaving under one assumption, there exists another net which behaves uner the other and gives the same results, although perhaps not in the same time. Various applications of the calculus are discussed.

- Warren McCulloch and Walter Pitts
"A Logical Calculus of the Ideas Immanent in Nervous Activity" Preface


The concept of neural networks has been around since the early 1950s, but was mostly dormant until the mid 1980s. One of the first neural networks developed was the perceptron. Created by a psychologist named Frank Rosenblatt in 1958, the perceptron was a very simple system which used interconnected neurodes to analyze data, usually visual patterns. Rosenblatt published a series of papers which generated a great deal of interest in the perceptron. Many people researched and developed further the perceptron model, even implementing it in hardware. The perceptron was widely and unrealistically praised by researchers. Rosenblatt and other scientists claimed that eventually, with enough complexity and speed, the perceptron would be able to solve almost any problem.
This was far from the truth. In 1969, Marvin Minsky and Seymour Papert published an influential book titled "Perceptrons". In it, they proved several theorems which showed that the perceptron could never solve a class of simple problems, and hinted at several other serious, fundamental flaws in the model. After "Perceptrons", scientists working on neural network type devices found it almost impossible to receive funding.

Marvin Minksy and Seymour Papert, the authors of "Perceptrons" (1968)



"Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things. The other was artificial, related from the beginning to the use of computers. Each of the sister sciences tried to build models of intelligence, but from very different materials. The natural sister built models (called neural networks) out of mathematically purified neurones. The artificial sister built her models out of computer programs.
In their first bloom of youth the two were equally successful and equally pursued by suitors from other fields of knowledge. They got on very well together. Their relationship changed in the early sixties when a new monarch appeared, one with the largest coffers ever seen in the kingdom of the sciences: Lord DARPA, the Defense Department's Advanced Research Projects Agency. The artificial sister grew jealous and was determined to keep for herself the access to Lord DARPA's research funds. The natural sister would have to be slain.
The bloody work was done by two staunch followers of the artificial sister, Marvin Minksy and Seymour Papert, cast in the role of the huntsmen sent to slay Snow White and bring back her heart as proof of the deed. Their weapon was not the dagger but the mightier pen, from which came a book - Perceptrons - purporting to prove that neural nets could never fill their promise of building models of mind: only computer programs could do this. Victory seemed assured for the artificial sister. And indeed, for the next decade all the rewards of the kingdom came to her progeny, of which the family of expert systems did best in fame and fortune.
But Snow White was not dead. What Minsky and Papert had shown the world as proof was not the heart of the princess; it was the heart of a pig."

-Seymour Papert, 1988



Throughout the 1970s, very few people remained interested in neural networks. There has been a renaissance in the field within the last decade, owing mainly to the work of cognitive psychogists, neurobiologists, and computer scientists who have demonstrated the usefulness and beauty of computational modelling of the brain.
Today's neural networks are much more sophisticated than earlier computational models involving neuron-like devices. Modern neural networks typically involve multiple layers of neurodes, and can modify their own behavior through techniques like backpropagation and other complex learning algorithms. I will explain these later.

Artificial Intelligence

The term "artificial intelligence" has always escaped definition, but I am going to try to define it for purposes of this paper. Artificial intelligence is the science of designing machines which can emulate aspects of human intelligence. I'm going to rely on your intution to interpret what I mean by "intelligence" and "machine". I don't want to get too deeply into philosophy.

There are two important distinctions within the domain of A.I. that you should be aware of. One is the classical difference between strong A.I. and weak A.I. Proponents of strong A.I. think that we will eventually be able to produce real intelligence in a machine. That is, the machine will actually be a thinking entity, a conscious being. Those who support the weak A.I. paradigm think that the creation of truly intelligent machines is an unrealistic goal, and believe that the goal of A.I. is to produce very good emulations of intelligent behavior. In other words, they believe that emulation of consciousness is enough, regardless of the internal mechanisms by which the machine exhibits the behavior.
The other distinction you should be aware of is the distinction between symbolic artificial intelligence and natural intelligence. Symbolic A.I. essentially uses logical, cognitive abstractions to produce intelligent behavior. Heuristics, or rules of thumb, are used to make logical, determinate decisions. Programming languages like LISP and Prolog were developed mainly for symbolic A.I. applications. As you might suspect, symbolic A.I. approaches have been successful in areas that would require purely symbolic or logical skills in a human.
Natural intelligence uses mathematical models of biological neural networks to produce intelligent behavior. The networks usually consist of organized connections of neurodes which can adapt their connections according computational rules. The network responds to stimuli and can "learn" the correct responses for given stimuli without being specifically programmed for them.
If we can determine how we direct our minds to a task, we can usually program a computer to emulate our behavior fairly well. Unfortunately, we don't know how our brain does things most of the time; they appear to be a result of the vastly complex processes in different parts of our brain. Examples include pattern recognition, learning processes, and sensory and motor tasks.
This is the kind of problem that people are trying to solve with neural networks because once you have designed a neural network, you can tell it what you want it to do without specifying how you want it done. Neural networks do not make use of cognitive abstractions as such. Any complicated thinking in an neural network would have to be a result of the simple interactions between neurodes.This involves "training" the network. More on this later.
The differences between symbolic A.I. and natural intelligence can be summarized by the differences between analysis and synthesis. The word 'analysis' comes from the Greek word 'analuein', "to undo". Symbolic A.I. is mostly an analytical approach; scientists start with the whole of intelligence and try to break it into constituent parts. On the other hand, the word 'synthesis' comes from the Greek word 'suntithenai', "to put together". The neural network approach is a synthetic one; scientists start with very simple behavior and try to assemble intelligence by combining these behaviors.
The debate about which type of A.I. is "better" is, in my opinion, not even appropriate. Symbolic A.I. has been very successful, not only in producing working products, but in elucidating how people think about things. There are symbolic applications for which neural networks would be inappropriate, just as there are applications with which symbolic A.I. has never been successful. Whether neural networks will ever reach a sufficient level of sophistication to conquer problems in the domain of abstract cognition remains to be seen. I think future successful systems will involve many different types of cooperating neural networks, coupled with strong systems of symbolic processing. I think Marvin Minsky summed it up pretty well:
"AI research must now move from its traditional focus on particular schemes. There is no one best way to represent knowledge, or to solve problems, and limitations of present-day machine intelligence stem largely from seeking "unified theories," or trying to repair the deficiencies of theoretically neat, but conceptually impoverished ideological positions. Our purely numerical connectionist networks are inherently deficient in abilities to reason well; our purely symbolic logical systems are inherently deficient in abilities to represent the all-important "heuristic connections" between things---the uncertain, approximate, and analogical linkages that we need for making new hypotheses. The versatility that we need can be found only in larger-scale architectures that can exploit and manage the advantages of several types of representations at the same time. Then, each can be used to overcome the deficiencies of the others. To do this, each formally neat type of knowledge representation or inference must be complemented with some "scruffier" kind of machinery that can embody the heuristic connections between the knowledge itself and what we hope to do with it."

- Marvin Minsky
"Logical vs. Analogical or Symbolic vs. Connectionist or Neat vs. Scruffy"
in Artificial Intelligence at MIT., Expanding Frontiers, Patrick H. Winston (Ed.), Vol 1, MIT Press, 1990. Reprinted in AI Magazine, 1991

Not to belabor a point, but I've made a summary of the differences between symbolic A.I. and natural intelligence. It's easy to see these two approaches as opposites, but if you read the above link to Minsky's paper, you might change your mind. Minksy and Papert have been blamed for the "dark ages" of neural network research in the 1970s and perhaps that was their goal at the time, but many people would argue that the division was "a natural outgrowth of specialization at that state of knowledge". (see Levine, 1989) In some ways, Minsky and Papert in their skepticism contributed to the development and maturation process of neural network research.


Differences between Symbolic A.I. and Natural Intelligence

Symbolic A.I. Natural Intelligence
Logical Analogical
Top-Down Bottom-Up
Analytic Synthetic
Symbolic Connectionist
Neat Scruffy

The Human Brain

Since neural networks are basically modelled after the brain, it's important to have a basic understanding of how the real brain works. The brain is far more complex than any neural network. People spend their entire lives trying to discover how the brain works, and we are still mystified by it. I will just briefly touch on the main points of neural organization and communication. For more information, see the bibliography at the end of this paper.

Organization

The brain is organized hierarchically. Organization at the molecular and cellular levels gives rise to organization at the structural level (different structures of the brain like the cerebellum, amygdala, neocortex, etc.). The structural organization relates to the functional organization, because different structures do different things.

In fetal development the brain begins as a hollow tube and develops in stages. The earliest brain structures to develop are involved in the most basic functions, like regulating body temperature and heartbeat. Next come the structures that control movement and balance. Next are the structures that are involved in basic emotions. The last thing to develop is the cerebral cortex, the folded gray matter on the top of the brain. This thin layer of tissue is responsible for most of our higher thinking (like language, solving math problems, and reason in general).
We humans have more than our share of cortical surface area. That's why the cortex is folded so much: it can't fit in our head. It's like scrunching up a blanket to put it in the closet. This high ratio of cortex to other parts of the brain is what makes us more intelligent than other mammals with larger brains. Neural network designers don't usually concern themselves with trying to mimic the brain's hierarchy. "If we always had to imitate nature, airplanes would have feathers and cars would have legs." [*]

I'll briefly explain how the brain is organized at a cellular level. There are two fundamental types of cells in the brain: neurons and glial cells. Glial cells basically provide structural support and housekeeping for other cells; I won't describe them in any more detail.

Neurons are the most important units in the nervous system. There are approximately 100 billion neurons in the brain, each of which is amazingly complex in itself. From a simplistic viewpoint, a neuron is a basic processing unit. A neuron receives input from other neurons, processes and integrates it, then bases its output (or lack thereof) on this integration.
Neurons are made up of several distinct parts. The soma (cell body) surrounds the nucleus of the cell, where the genetic information is stored. Floating around inside the soma (in a fluid called the cytoplasm) are molecular structures which perform various tasks.
Connected with the soma are dendrites, branch-like extensions which receive input from other neurons. On the other end of the cell is the axon, or the output cable of the cell. The axon branches out in a dense network; each branch stretches out to other neurons. At the end of these branches are small bumps, called terminal buttons, which contain the chemicals necessary for communication with other cells. The terminal buttons link to the dendrites of other neurons. The two cells do not actually touch; there is a very small gap between the terminal button and the dendrite across which chemical communication takes place. This junction of terminal button of one cell to the dendrite of another is called a synapse (synapse can also be used as a verb, meaning "to form a synapse with").



Schematic View of A Neuron ( 1996, Chad Loder)

Communication

Many synaptic inputs combine to drive a single neuron. Inputs can be either excitatory (contributing to the probability of the neuron firing) or inhibitory (detracting from the probability of the neuron firing). Mathematically, you can think of a neuron as an adder; excitatory messages are positive and inhibitory messages are negative. If the sum of all the inputs (within a given period of time) exceeds a certain positive value (the threshold), then the neuron will fire.
The firing is an all-or-none process - either the neuron totally fires or it totally doesn't; the strength of firing is constant. A warning: I think this point about all-or-none firing can be misleading. It would be natural to misinterpret this concept and claim that the brain is essentially a digital computer. In reality, the brain is very much an analog system. There are many different kinds of ways to modulate neuronal operation, including chemical balance, electromagnetism, and fluid dynamics.
Even though the firing of the neuron is an all-or-none process, neurons can have a firing intensity which is actually a function of the frequency of firing. Neurons can fire at varying rates depending on how often they are stimulated, but each "spike" is still of the same strength. Since the sum of all inputs is added with respect to time, a neuron firing at 30 pulses/second would have more of an effect on other neurons than one firing at only 15 pulses/second.

Neural Networks

Vectors, Error, and Problem Spaces

Let's play a guessing game. If I think of a number, say 7, and you guess 3, I could tell you that you were wrong, but it wouldn't really help you. If I told you that you were wrong and said your guess was too low, it would be more helpful because you would know in which direction to adjust your guess (the higher direction).
We could represent this game on a one dimensional line. A dimension is a variable which can be increased or decreased (to put it simply). A problem in two variables (length and height, for example) can be represented in two dimensions; a problem in three variables (eg, taste, temperature, and cost) can be represented in three dimensions. The dimensional representation of a problem is called the problem space. So, back to our guessing game. The problem has one variable (the number in question) which can be increased or decreased, so this game's problem space is one dimensional.
Let's play another guessing game. I think of a combination of a number and a time of day, and you try to guess which combination I have in mind. I choose the number 8 and the time 12 o'clock noon. Your first guess is the number 3 and 6 o'clock P.M. What's my response? The most helpful response would include information about both variables: "Too low and too late." Now you can guess a higher number and an earlier time. If we repeat this process, with you guessing and me giving feedback, your guesses should converge upon the optimal solution. Your error at any point in the game is the difference between your current guess and the right answer. As we progress, your error should be approaching zero. This is called minimizing your error.
A vector is simply a straight line through the problem space. Classically, a vector is defined as a line that has both length and direction. This is easy to see. If you ask me for directions to the beach and I reply "The beach is five miles away", I haven't really helped you. I've only given you the length of the the vector through the problem space; to find the beach you need to know the direction of the vector as well. If I say "The beach is five miles due west", you can easily find it. A vector through the problem space contains information about every variable in the problem space. Your error vector is the line through the problem space connecting you with the optimal solution. The longer the error vector is in a given dimension, the further away that dimension's variable is from the correct value.
It's easy to think about 1, 2, and 3-dimensional problem spaces. Most complicated problems have more than 3 variables, so their problem spaces are in more than 3 dimensions, which is almost impossible for us to visualize. Generally, something describing multiple dimensions is prefixed by hyper-. A hypercube is a cube in more than 3 dimensions, a hyperparaboloid is a parabola in more than two dimensions, etc. Now you can use impressive terms like "I am trying to minimize the length of my error vector in the hyperdimensional problem space." The concepts are easy; the terms are not.
In fact, our guessing game example is analagous to the way many neural networks learn the right output for a given input. The initial input is the phrase "I'm thinking of an output pattern that goes with this pattern here." There is a correct output pattern (in our game, the number to guess). The neural network goes through a cycle of guessing and feedback (from a system called a trainer), and if the network is programmed correctly, it should converge upon the optimal output pattern at some point.

Parallel and Serial, Convergence and Divergence

- from here down, things are sparse Things to mention: Receptive fields, parallel processing in the brain, convergence, divergence, feedback, lateral inhibition

[Schematic of converge/diverge system]

Limiting Functions and Normalization

Things to mention: Limiting functions (ie, sigmoid, hard limiter, ramps, trig, etc.), normalizing (why: to avoid unbalance - epilepsy & microphone feedback etc.), damping the feedback loops (horizontal cells, explain with sines and cosines -derivatives of each other - they damp each other - put in my example of steering a car), logarithmic sensory input in the nervous system (perceived twice as loud means it's ten times as intense, etc.)

A Simple Neural Network

Let's just jump right in and look at a simple example of a neural network. The following picture is typical (albeit simplified) of the schematic diagrams you will see in many papers and books. Circles represent the neurodes, with synaptic connections represented by joining lines.


Taxonomical Hierarchy of Neural Network Types

Things to mention:

Associative Memories

Autoassociative memories can pair noisy or garbled input with a stored version of the "pure" data. Examples include possible uses in optical character recognition: A distorted or fuzzy character is presented to the network, which pairs the character with a previously learned pure version.


Heteroassociative memories can pair a given input with a different output pattern. An example of an heteroassociative memory would be a network which is presented with a character and returns the ASCII value of the character.



Annotated Bibliography


In this case, "annotated" means that I am going to tell you what I thought of the books that I've read. Please excuse a little bit of opinionization; I just want to warn you away from the dead-ends that I went down.


Williams, Ronald J.
Ronald J. Williams is a professor here at the College of Computer Science; in fact, he's right upstairs from me. He's published many papers on neural networks, in particular his work on backpropagation algorithms. You can see some of his recent papers in his ftp directory at NU CCS.
Williams, Ronald J. "Adaptive State Representation and Estimation Using Recurrent Connectionist Networks."
Anderson, James A. and Edward Rosenfield (1988). Neurocomputing: Foundations of Research
Cambridge, MA: MIT Press
ISBN: 0-262-01097-6
This is an awesome book. It includes many of the classic papers and book exceprts in the field of neurocomputing along with great introductions by Anderson and Rosenfield which really puts the papers in context. Some of the papers I cite here can be found in this book. Everyone I've met in this field has a copy of this book.
Caudill, Maureen and Charles Butler (1990). Naturally Intelligent Systems
Cambridge, MA: MIT Press
ISBN: 0-262-03156-6
This is an excellent introduction to neural networks. Out of the 30 or so books that I've looked at, I've found this one the clearest and most readable. It's also fairly thorough for an introduction.

McCulloch, W.S. and W.H. Pitts (1943). "A Logical Calculus of the Ideas Immanent in Nervous Activity" Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133.
This is one of the most influential papers in natural intelligence and cybernetics ever written. The paper mostly emphasizes logic instead of physiology (both authors were competent physiologists). It is easy enough for any intelligent person to understand. It can also be found on pp. 18-27 of Neurocomputing.
Anderson, James A. (1995). An Introduction to Neural Networks
Cambridge, MA: MIT Press
ISBN: 0-262-01144-1
This is a large (650 pages) textbook style introduction. It's very thorough and recent; James Anderson is one of the heavy-hitters in the area of neural networks. It's pretty clear, providing you've taken a couple of calculus courses, basic discrete math, and basic linear algebra. If you're not afraid of a little math, you might want to check this book out.
Minksy, Marvin. "Logical vs. Analogical or Symbolic vs. Connectionist or Neat vs. Scruffy"
in Artificial Intelligence at MIT., Expanding Frontiers, Patrick H. Winston (Ed.), Vol 1, MIT Press, 1990. Reprinted in AI Magazine, 1991
This is a recent short paper by one of the founding fathers of A.I. Minksy is a genius. In this paper he offers insight into where the field of A.I. is heading, and offers a solution to the ongoing war between symbolic A.I. and connectionism.
You can also download a text version of this paper from me.

Kartalopoulos, Stamatios V. (1996) Understanding Neural Networks and Fuzzy Logic
New York: IEEE Press
ISBN: 0-7803-1128-0
This book was helpful. The good thing about it is that it's recent and it makes some connections between neural nets and fuzzy logic. It's pretty understandable, but I think his philosophy of mathematics and mind is a little bit pedestrian. He makes a couple of assumptions that are just wrong. Maybe this is what happens when you try to apply engineering to philosophy?

Robinson, David A. "Implications of neural networks for how we think about brain function." Behavioral and Brain Sciences, vol. 15, no. 4, pp. 644-655 (Dec. 1992)
This is a good article. It explains the relations between neural networks and real brains. It's readable and clear; you might want to know a very little bit of neurobiology. He makes a few really interesting points.

Hanson, Stephen J. and David J. Burr. "What connectionist models learn: Learning and representation in connectionist networks." Behavioral and Brain Sciences, vol. 13, no. 3, pp. 471-518 (Sept. 1990)
An interesting article. Gives a good brief explanation of connectionism and the relationships between A.I. and cognitive psychology. Also gives a good general overview of the different classifications of connectionist models.

Other Links

If you have any comments, criticisms, or suggestions about this page, I'd love to hear them. As a professor of mine said, "There's no such thing as good writing, only good editing." So send me E-Mail and I will try to make you happy.


Here are some other important links about my college and myself:

College of Computer Science

Sometimes people ask me why I decided to go to Northeastern. My reasons are simple: the College of Computer Science here is great, Boston is great, and I want to be able to get a job when I graduate. Northeastern has the strongest Co-Op program in New England. As part of your education, you actually work in your field. Two years of professional experience at good companies really makes you a competitive job-seeker when you've graduated. Equally as important, the Co-Op process necessarily keeps the professors in touch with what's going on in the world of computer science. I've also found that the professors as a whole are really good about knowing the difference between computer science and computer technology, a pet peeve of mine which I think other schools fall prey to. But I digress.
To give you a personal example, this summer and fall I will be working at Lotus.

Northeastern University

Check out the University's home page for information on admissions, colleges within the university, etc.

NEU Neuroscience Programs

This is a list of people from all departments of NEU that are doing research in many different areas of neuroscience.

Chad's Home Page

Here you can learn all about me and what I'm doing here at Northeastern. If you want to offer me a job (not this summer or fall :-), you can check out my resume.




Copyright 1996, Chad Loder. All rights reserved. This page is copyrighted material. Permission is hereby granted to reproduce this page for private use only, provided this copyright notice remains intact.