;
This web site and all of its contents Copyright © 2002 Peer Science LLC. All rights reserved

Neural Networks Basics


The Origins of Neural Networks

What Neural Networks Do

How Neural Networks Work

Supervised Neural Networks

Unsupervised Neural Networks

The Origins of Neural Networks

The impetus for creating artificial neural networks comes from the observation that the world's most marvelous computer is still the human brain. For over 30 years, scientists have continued to develop constructs that allow the emulation of some of the basic functionality of human thought processes. These artificial neural networks are learning systems that discover how to solve problems by being exposed to a situation and learning from mistakes -- just as biological neural networks do. This presents a fundamentally different approach to solving complex problems. Rather than developing a solution and writing an algorithm, the designer simply supplies the artificial neural network with data and it learns how to solve the problem from experience.

The biological neural network (the brain) provided the impetus for the pursuit of creating powerful artificial neural networks using computers

The origins of computing with neural networks, or neurocomputing, are grounded in cognitive psychology (pre-dating the computer era). The 1943 paper of McCulloch and Pitts showed that even simple types of neural networks could, in principle, compute any arithmetic or logical function. In 1957, the first successful neurocomputer (the Mark I Perceptron) was developed by Rosenblatt and Wightman. Rosenblatt is considered by many to be the father of neurocomputing. His primary interest was in pattern recognition and his first demonstration of the Perceptron was as an optical character recognizer (OCR). Following Rosenblatt, Widrow invented a different kind of processing element called the ADALINE which was equipped with a powerful new learning law that is still in widespread use today.

By the mid 1960's, neurocomputing's first era of successes was drawing to a close. The final blow came when Minsky and Papert published a paper entitled Perceptrons which was meant to discredit neural network research and divert neural network research funding to the field of artificial intelligence. Perceptrons proved mathematically that a perceptron could not implement the EXCLUSIVE OR (XOR) logical function nor many other such predicate functions which are fundamental to the field of logic. During the ensuing period, from 1967 to 1982, little explicit neurocomputing research was carried out in the United States.

The 1980's brought revitalization in neurocomputing research, partially spawned by the lower cost in computing resources. The Defense Advanced Research Projects Agency (DARPA) began funding neural research in 1983. A powerful new learning algorithm -- backpropagation of errors -- was independently invented by Rumelhart and Werbos in 1984. This widely applicable algorithm still accounts for more than 75% of the neural network applications fielded today.

In 1988, DARPA started the Neural Network Program -- managed by Dr. Barbara Yoon -- by awarding some $33 million in contracts the first year. Martin Marietta Electronics Systems (now Lockheed-Martin) was awarded a portion of this funding, and it was here that Dr. Ed DeRouin, now CTO of Peer Science, began his research work in neural networks.

Below is a brief recap of the history of neural networks:

• Neural networks were conceived in the 1940’s, invented in the late 50's

• Research was active and progressed rapidly during the early 60's

• The book Perceptron curtailed neural research in the mid 60's

• Research continued through the 70's, but was low profile

• Back propagation was reinvented in early 80's

• DARPA launched and funded the Neural Network Program in 1988

• A number of neural based products hit the marketplace in the early 90's

What Neural Networks Do

From a formal perspective, artificial neural networks are specialized computer software constructions that produce non-linear models for complex problems. They are commonly used as a tool for recognizing an object (or event) given an associated pattern. For example, neural networks can be used to recognize a certain illness given the pattern from a series of medical tests, or they may be used to identify credit card fraud from a peculiar pattern of spending. Neural networks are somewhat unique in that they learn from example. That is, they learn by being "shown" examples of what needs to be recognized. Thus, in order for a neural network system to be developed, it is necessary to have a set of historical data concerning the subject to be learned by the network.

Neural networks have been applied to many different types of problems, but they have been found to excel in problems involving pattern recognition, optimization, and time series analysis or prediction. There are two major advantages that neural networks have over other approaches to pattern recognition and prediction. First, as previously stated, neural networks are learning systems. They learn how to solve a problem by adapting to their mistakes and minimizing their error on subsequent trials. The impact this has on the time required for a development cycle is tremendous, because rather than spending time developing a solution encoded as algorithms, the implementor lets the learning system learn to solve the problem. The result is a rapid prototyping capability and the ability to attack large, complex problems. Also, the built-in adaptation is crucial to success for dynamically changing problems such as fraud and financial prediction. That is, the system can continue to learn as the fraud schemes or financial markets change.

The second major advantage of neural networks is that they are inherently non-linear. When linear mathematics is used to model non-linear objects or events, there is a built-in error factor that adversely effects the performance. Most real-world problems are non-linear, but practitioners typically assume linearity because it generally simplifies the mathematical computations. Because a non-linear transfer function (usually a sigmoid) is a fundamental part of the neural computation, non-linear mathematics is used to model non-linear problems resulting in a much closer fitting model and higher performance.

How Neural Networks Work

In a biological nervous system, a neuron is a brain cell -- what makes up the gray matter between your ears. A biological neural network is a complex group of interconnected brain cells. An artificial neural network mimics biological behavior by manipulating connections between simulated neurons. This artificial "brain" can therefore mimic its biological counterpart - including the ability to be trained, or "learn". Training an artificial neural network involves nothing more than making repeated small adjustments to the strength of the connections between neurons. What neural networks learn is pattern recognition, which can be a powerful tool for tackling complex, real-world problems.

 

Artificial neurons function very similarly to their biological counterparts, from which they are modeled.

 

An artificial neural network is an information processing system that is non-algorithmic, nonlinear, and intensely parallel. It is not a computer in the sense we think of them today, nor is it programmed like a computer. Instead, a neural network consists of a number of very simple and highly interconnected software functions or processors called processing elements (PEs) illustrated in the figure above. These PEs are connected to a number of weighted links over which information in the form of signals pass, much as information signals are passed over the links in our brains known as synapses.

Learning in an artificial neural network is accomplished by exposing the network to large amounts of historical data. Thus, for a neural network to make a decision, it first takes information from the outside world and filters the signals with the weight values, i.e. the larger the weight, the more of the signal is passed along. The neural network then iterates and adjusts the weights, just as humans learn from their mistakes, so that the network makes the smallest number of errors during a pass through the training examples on the sample data. The training (learning) algorithm for neural networks is very computationally intensive. Only in the last 10 years has the cost of computing power dropped far enough that neural networks have become a viable approach to complex, real-world problems.

A single PE can receive information from the outside world such as insurance claims information, or internally from other PEs. Then, the processing element integrates the information with other inputs and passes its own response on to the next layer of PEs. The output of a PE can go to another PE or to the outside world, such as a fraud detection alarm to an investigator. This continues until the signals arrive at the output layer where they vote. The output PE with the largest response value wins.

This technology is fundamentally different from traditional ways computers have been programmed for decades. Designing a successful neural network requires more than computer science or engineering skills, it suggests an intuitive understanding for problem solving and an eye for artful design as well. Neural networks are not a magic bullet, and they require skilled professionals to use properly, but they have been proven to be exceptionally useful tools for solving complex problems.

 

Supervised Neural Networks

A supervised neural network uses target output values to train the network weights. Actually, the "supervised" label can be applied to many types of model, not just neural networks.

Probabilistic neural networks (PNNs)

The probabilistic neural network is a connectionist implementation of a statistical classification technique called Parzen Windows. It is an especially useful model when there are a very small number of training samples available.

Backpropagation neural networks (BPNs)

By far the most popular neural network architecture for pattern classification is the multi-layer perceptron employing back-propagation of errors learning, often called backpropagation networks (BPNs). Neural network classifiers "learn" the mapping of features to classes by iterating through a training set, a collection of input pattern samples matched with their corresponding output classification. In backpropagation neural networks, this learning is accomplished using gradient descent techniques to minimize mean square error measured against the training set.

A backpropagation neural network consists of a layer of input units, a layer of output units, and one or more layers of hidden units. The number of the input variables, or features, determines the number of input units, and the number of categories (classes) into which the input vectors are to be mapped sets the number of output units. The neural network designer sets the number of hidden units and the number of hidden layers. The figure below illustrates the general architecture of a backpropagation neural network.

General Backpropagation Neural Network Architecture

During learning, each presentation of a training vector to the BPN consists of two phases: a forward pass through the network to produce a set of output values, and a reverse pass to adjust the weight strengths. As each training vector is presented to the network in the forward pass, it is applied to the X-layer, known as the input layer. This layer distributes the input values, xi, to the hidden layer, shown as the Z-layer in the figure. Hidden layer processing elements perform a weighted sum on their inputs. This sum is then passed through a non-linear activation, or squashing, function to produce the processing element’s output. This output is distributed through the weighted connections to the next layer above until all layers have been processed. This completes the forward pass through the network.

The output error of the network is calculated by subtracting the network's computed outputs from the desired outputs. The backward pass through the network uses this error to calculate adjustments for the network's weights.

The back-propagation of errors method is a gradient descent optimization technique. The individual weight adjustments calculated during the backward pass of the network are in the direction opposite to the gradient of the error function of the network. By modifying the weight vectors down this gradient for each input vector, the mean squared error of the network outputs with respect to the weights is minimized for a given training set.

Once the neural network has been trained, new input patterns to the neural network can be rapidly processed to categorize them. For example, if a neural network is trained with past fraudulent and non-fraudulent credit card transactions, it can be used to identify fraud in new transactions as they occur. The neural network learns to recognize fraudulent patterns without explicit programming.

Unsupervised Neural Networks

Unsupervised neural networks adjust weights without the use of target output values. This type of neural network is usually used to identify "clusters" in the data, groups of data samples that are similar to one another and different from other groups of samples.

Kohonen Self-Organizing Maps (SOMs)

A Kohonen Self-Organizing Feature Map clusters data by mapping samples with high number of feature dimensions onto a low-dimension map in a way that preserves the topology of the samples. That is, samples that are close together in high-dimensional space are close together on the low-dimensional map. Samples that are far apart in the high-dimensional space are far apart on the low-dimensional map.

Peer Science has modified the processing of the SOM to make it hundreds of times faster and much more stable than the "textbook" version that is widely available. These improvements are extremely important since the SOM can be very slow to train and produce disparate results for the same data set.

Fuzzy Logic and Fuzzy Expert Systems

Evolutionary Programming and Genetic Algorithms

Case Based Reasoning