|
The Origins of Neural Networks
What Neural Networks Do
How Neural Networks Work
Supervised Neural Networks
Unsupervised Neural Networks
The Origins of Neural Networks
The impetus for creating artificial neural networks comes from the observation that the world's most marvelous computer is still the human brain. For over 30 years, scientists have continued to develop constructs that allow the emulation of some of the basic functionality of human thought processes. These artificial neural networks are learning systems that discover how to solve problems by being exposed to a situation and learning from mistakes -- just as biological neural networks do. This presents a fundamentally different approach to solving complex problems. Rather than developing a solution and writing an algorithm, the designer simply supplies the artificial neural network with data and it learns how to solve the problem from experience.

The biological neural network (the brain) provided the impetus for the pursuit of creating
powerful artificial neural networks using computers
The origins of computing with neural networks, or neurocomputing, are grounded in cognitive psychology
(pre-dating the computer era). The 1943 paper of McCulloch and Pitts showed that even simple types of neural networks
could, in principle, compute any arithmetic or logical function. In 1957, the first successful neurocomputer
(the Mark I Perceptron) was developed by Rosenblatt and Wightman. Rosenblatt is considered by many to be the father
of neurocomputing. His primary interest was in pattern recognition and his first demonstration of the Perceptron was
as an optical character recognizer (OCR). Following Rosenblatt, Widrow invented a different kind of processing element
called the ADALINE which was equipped with a powerful new learning law that is still in widespread use today.
By the mid 1960's, neurocomputing's first era of successes was drawing to a close. The final blow came when Minsky
and Papert published a paper entitled Perceptrons which was meant to discredit neural network research and
divert neural network research funding to the field of artificial intelligence. Perceptrons proved
mathematically that a perceptron could not implement the EXCLUSIVE OR (XOR) logical function nor many other such
predicate functions which are fundamental to the field of logic. During the ensuing period, from 1967 to 1982, little
explicit neurocomputing research was carried out in the United States.
The 1980's brought revitalization in neurocomputing research, partially spawned by the lower cost in computing
resources. The Defense Advanced Research Projects Agency (DARPA) began funding neural research in 1983. A powerful
new learning algorithm -- backpropagation of errors -- was independently invented by Rumelhart and Werbos in 1984.
This widely applicable algorithm still accounts for more than 75% of the neural network applications fielded today.
In 1988, DARPA started the Neural Network Program -- managed by Dr. Barbara Yoon -- by awarding some $33 million
in contracts the first year. Martin Marietta Electronics Systems (now Lockheed-Martin) was awarded a portion of this
funding, and it was here that Dr. Ed DeRouin, now CTO of Peer Science, began his research work in neural networks.
Below is a brief recap of the history of neural networks:
• Neural networks were conceived in the 1940’s, invented in the late 50's
• Research was active and progressed rapidly during the early 60's
• The book Perceptron curtailed neural research in the mid 60's
• Research continued through the 70's, but was low profile
• Back propagation was reinvented in early 80's
• DARPA launched and funded the Neural Network Program in 1988
• A number of neural based products hit the marketplace in the early 90's
What Neural Networks Do
From a formal perspective, artificial neural networks are specialized computer software constructions that produce non-linear
models for complex problems. They are commonly used as a tool for recognizing an object (or event) given an associated
pattern. For example, neural networks can be used to recognize a certain illness given the pattern from a series of
medical tests, or they may be used to identify credit card fraud from a peculiar pattern of spending. Neural networks
are somewhat unique in that they learn from example. That is, they learn by being "shown" examples of what needs to
be recognized. Thus, in order for a neural network system to be developed, it is necessary to have a set of historical
data concerning the subject to be learned by the network.
Neural networks have been applied to many different types of problems, but they have been found to excel in
problems involving pattern recognition, optimization, and time series analysis or prediction. There are two major
advantages that neural networks have over other approaches to pattern recognition and prediction. First, as previously
stated, neural networks are learning systems. They learn how to solve a problem by adapting to their mistakes and
minimizing their error on subsequent trials. The impact this has on the time required for a development cycle is
tremendous, because rather than spending time developing a solution encoded as algorithms, the implementor lets the
learning system learn to solve the problem. The result is a rapid prototyping capability and the ability to attack
large, complex problems. Also, the built-in adaptation is crucial to success for dynamically changing problems such
as fraud and financial prediction. That is, the system can continue to learn as the fraud schemes or financial markets change.
The second major advantage of neural networks is that they are inherently non-linear. When linear mathematics is
used to model non-linear objects or events, there is a built-in error factor that adversely effects the performance.
Most real-world problems are non-linear, but practitioners typically assume linearity because it generally simplifies
the mathematical computations. Because a non-linear transfer function (usually a sigmoid) is a fundamental part of
the neural computation, non-linear mathematics is used to model non-linear problems resulting in a much closer
fitting model and higher performance.
How Neural Networks Work
In a biological nervous system, a neuron is a brain cell -- what makes up the gray matter between your ears.
A biological neural network is a complex group of interconnected brain cells. An artificial neural
network mimics biological behavior by manipulating connections between simulated neurons. This artificial "brain"
can therefore mimic its biological counterpart - including the ability to be trained, or "learn". Training an
artificial neural network involves nothing more than making repeated small adjustments to the strength of the
connections between neurons. What neural networks learn is pattern recognition, which can be a powerful tool for
tackling complex, real-world problems.

Artificial neurons function very similarly to their biological counterparts, from which they are
modeled.
An artificial neural network is an information processing system that is non-algorithmic, nonlinear, and intensely
parallel. It is not a computer in the sense we think of them today, nor is it programmed like a computer. Instead, a
neural network consists of a number of very simple and highly interconnected software functions or processors called
processing elements (PEs) illustrated in the figure above. These PEs are connected to a number of weighted links over
which information in the form of signals pass, much as information signals are passed over the links in our brains
known as synapses.
Learning in an artificial neural network is accomplished by exposing the network to large amounts of historical
data. Thus, for a neural network to make a decision, it first takes information from the outside world and filters
the signals with the weight values, i.e. the larger the weight, the more of the signal is passed along. The neural
network then iterates and adjusts the weights, just as humans learn from their mistakes, so that the network makes
the smallest number of errors during a pass through the training examples on the sample data. The training (learning)
algorithm for neural networks is very computationally intensive. Only in the last 10 years has the cost of computing
power dropped far enough that neural networks have become a viable approach to complex, real-world problems.
A single PE can receive information from the outside world such as insurance claims information, or internally
from other PEs. Then, the processing element integrates the information with other inputs and passes its own response
on to the next layer of PEs. The output of a PE can go to another PE or to the outside world, such as a fraud
detection alarm to an investigator. This continues until the signals arrive at the output layer where they vote.
The output PE with the largest response value wins.
This technology is fundamentally different from traditional ways computers have been programmed for decades.
Designing a successful neural network requires more than computer science or engineering skills, it suggests an
intuitive understanding for problem solving and an eye for artful design as well. Neural networks are not a magic bullet,
and they require skilled professionals to use properly, but they have been proven to be exceptionally useful tools for
solving complex problems.
Supervised Neural Networks
A supervised neural network uses target output values to train the network weights. Actually, the "supervised"
label can be applied to many types of model, not just neural networks.
Probabilistic neural networks (PNNs)
The probabilistic neural network is a connectionist implementation of a statistical classification technique called Parzen Windows.
It is an especially useful model when there are a very small number of training samples available.
Backpropagation neural networks (BPNs)
By far the most popular neural network architecture for pattern classification is the multi-layer perceptron
employing back-propagation of errors learning, often called backpropagation networks (BPNs). Neural network
classifiers "learn" the mapping of features to classes by iterating through a training set, a collection of input
pattern samples matched with their corresponding output classification. In backpropagation neural networks, this
learning is accomplished using gradient descent techniques to minimize mean square error measured against the
training set.
A backpropagation neural network consists of a layer of input units, a layer of output units, and one or more
layers of hidden units. The number of the input variables, or features, determines the number of input units, and the number of
categories (classes) into which the input vectors are to be mapped sets the number of output units. The neural network designer
sets the number of hidden units and the number of hidden layers. The figure below illustrates the general architecture
of a backpropagation neural network.

General Backpropagation Neural Network Architecture
During learning, each presentation of a training vector to the BPN consists of two phases: a forward pass through
the network to produce a set of output values, and a reverse pass to adjust the weight strengths. As each training
vector is presented to the network in the forward pass, it is applied to the X-layer, known as the input layer. This
layer distributes the input values, xi, to the hidden layer, shown as the Z-layer in the figure. Hidden
layer processing elements perform a weighted sum on their inputs. This sum is then passed through a non-linear
activation, or squashing, function to produce the processing element’s output. This output is distributed through the
weighted connections to the next layer above until all layers have been processed. This completes the forward pass
through the network.
The output error of the network is calculated by subtracting the network's computed outputs from the desired
outputs. The backward pass through the network uses this error to calculate adjustments for the network's weights.
The back-propagation of errors method is a gradient descent optimization technique. The individual weight
adjustments calculated during the backward pass of the network are in the direction opposite to the gradient of the
error function of the network. By modifying the weight vectors down this gradient for each input vector, the mean
squared error of the network outputs with respect to the weights is minimized for a given training set.
Once the neural network has been trained, new input patterns to the neural network can be rapidly processed to categorize
them. For example, if a neural network is trained with past fraudulent and non-fraudulent credit card transactions, it can be
used to identify fraud in new transactions as they occur. The neural network learns to recognize fraudulent patterns
without explicit programming.
Unsupervised Neural Networks
Unsupervised neural networks adjust weights without the use of target output values. This type of
neural network is usually used to identify "clusters" in the data, groups of data samples that are similar to one
another and different from other groups of samples.
Kohonen Self-Organizing Maps (SOMs)
A Kohonen Self-Organizing Feature Map clusters data by mapping samples with high number of feature dimensions
onto a low-dimension map in a way that preserves the topology of the samples. That is, samples that are close together
in high-dimensional space are close together on the low-dimensional map. Samples that are far apart in the
high-dimensional space are far apart on the low-dimensional map.

Peer Science has modified the processing of the SOM to make it hundreds of times faster and much more stable than
the "textbook" version that is widely available. These improvements are extremely important since the SOM can be very
slow to train and produce disparate results for the same data set.
Fuzzy Logic and Fuzzy Expert Systems
Evolutionary Programming and Genetic Algorithms
Case Based Reasoning
|