Sign Out
Logged In:
Tab Image

What are neural networks? How do they work? When should you use them?

A beginner's guide to neural networks

by Nigel Cummings

The term neural networks often brings to mind an image of machines which mimic the operation of the human brain, but this is a popular misconception. Only the most simplistic analogy is applicable in this context: whilst, like the brain, neural networks can be thought of as consisting of inter-connected neurons linked together by synapses, artificial neural networks (ANNs), do not exist as electronic brains in boxes. Indeed for the most part they are written as simulations in software, rather than existing as dedicated hardware systems. These networks are more often than not sub-routines written in the common computer language 'C', indeed there are several 'C' based software development tools available that can automatically generate ANNs.

(Left: A perceptron-type network)

Neural networking is not a new concept. Simulations using formal logic date back to McCulloch and Pitts (1943) who developed models of neural networks based on their understanding of neurology. Neuroscience was influential in the development of neural networks, but psychologists and engineers also contributed to the progress of neural network systems - Rosenblatt (1958) created a great deal of interest when he designed and developed the Perceptron. The Perceptron had three layers with the middle layer known as the association layer. This system could learn to connect or associate a given input to a random output unit.

Another system was the ADALINE (ADAptive LInear Element) developed in 1960 by Widrow and Hoff (Stanford University). The ADALINE was an analogue electronic device made from simple components. However the learning method used was different to that of the Perceptron; it employed the Least-Mean-Squares (LMS) learning rule. It was not until the 1980s however, that algorithms became sophisticated enough for general applications, and computers became powerful enough to process them.

Moving swiftly up to date, in the 1990s ANNs are increasingly applied to complex real-world problems - they can be used as pattern recognition engines, with the ability to generalise in making decisions about imprecise input data. They can also provide solutions to a variety of classification problems such as speech, character or signal recognition. ANNs are capable of providing prediction functions and system modelling where the physical processes are not fully understood or are extremely complex.

How do they do what they do?

In a living mammalian brain, when enough input synapses send a signal into a neuron, the neuron fires, thus causing signals to be sent down to output synapses. These in turn cause other neurons to fire, and so the process goes on. ANNs are processing devices (algorithms or actual hardware) that are loosely modelled after the neuronal structure of the mammalian cerebral cortex but on much smaller scales. However this analogy is not worth pursuing too far as we must consider the subtlety of brain mechanisms and also the fact they are vastly more complex with almost limitless interconnections - typically 1011 neurons each of which is connected to a large number of other neurons (typically 1000+). These neurons are arranged in an uneven layer-like structure The early layers receive input from the sense organs such as eyes and ears, while the final layers produce motor output (limb movement etcetera.) The middle layers form the associative cortex. (Does this sound similar to Rosenblatt's Perceptron model?)

The simplest computer focused definition of ANNs is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defined a neural network as: "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs". That statement provides a fine description of a neural network, provided one understands how computers function. Most of us however, are more interested in the results a computer can produce rather than the mysteries of electronic computation.

It may therefore be helpful, in order to gain a better understanding of artificial neural networks (ANNs), to be aware of how a conventional 'serial' computer and its software processes information. Serial computers possess Central Processor Units (CPUs) which are able to address an array of memory locations, locations where data and computational instructions are stored. Computations are made by the computer's processor reading an instruction as well as any data the instruction requests from memory addresses - the instruction is then executed and the results are saved in a specified memory location as required. In serial systems computational steps can be seen as sequential, logical and deterministic, and the state of a given variable can be tracked from one operation to another.

By comparison however, ANNs are not sequential or necessarily deterministic. There are no complex central processors, instead there are large numbers of simple ones which generally do nothing more than take the weighted sum of their inputs from other processors. ANNs do not execute programmed instructions; they respond in parallel (either in simulation or actual) to patterns of input presented to them. ANNs do not utilise separate memory addresses for storing data; instead, the data is contained in the overall activation 'state' of the network and 'knowledge' is represented by the network itself, which is, in literal terms, more than the sum of its individual elements.

At their simplest level ANNs map data. Imagine for a moment how a brain relays its data. When a pattern of signals is sent down its input lines, the neurons fire in complex ways, resulting in a pattern of signals on the output lines. The characteristics of this mapping depend partly on a network's structure, artificial or otherwise - the number of neurons (weights) and their organisation - and partly on the strengths of weights on each of the lines. The larger the weight, the greater the effect that line has in determining whether the neuron it connects to should be fired. Weights are assigned by a learning process which trains the network to behave in a required way. One of the beauties of this approach when applied to computing lies in the fact that little, if any programming is involved.

Untrained networks like untrained mammalian brains have to be taught. The most common way to achieve this is to input them with patterns of typical input data and adjust the weights according to how much resulting output patterns differ from their expected outcome. Such adjustment is performed repeatedly, and for a very large number of input patterns, until the network appears to operate satisfactorily. It should also be pointed out that ANNs possess the capacity to generalise and to deal with noisy or uncertain data - very desirable attributes in many applications.

As no programming is needed, and data development times tend to be short, a neural network can in principle compute any computable function. Existing as simple programs, neural networks can in theory be implemented on parallel lines to the nth degree, until sufficient speed and/or power of computation for any particular task is attained - in this respect they operate similarly to the mammalian brain where densely interconnected, parallel structures process information. (ANNs are also referred to as neomorphic systems, connectionist architectures, and parallel distributed processing.)

When should they be used?

ANNs should be considered for any problem that can be expressed as a mapping where tolerance for errors is important, where many example mappings are available, but where hard and fast rules cannot easily be applied. There are many different types of ANNs but some are more popular than others. The most widely used ANN is known as the Back Propagation ANN - this is ideal for prediction and classification tasks. Another is the Kohonen or Self Organising Map which is excellent when finding relationships amongst complex sets of data.

Other popular ANNs include the multilayer Perceptron (mentioned at the beginning of this exposition) which is usually trained with the back propagation of error algorithm, learning vector quantization, and radial basis function. Some ANNs have been classified as feedforward while others are recurrent depending on how data is processed through the network.

(Right: The brain compared to an artificial model)

ANNs can also be classified by their method of learning - some ANNs require supervised training while others are referred to as unsupervised or self-organising. Supervised training can best be compared to a student being guided by an instructor. With self organised training, unsupervised algorithms perform clustering of the data into similar groups based on measured attributes or those features serving as inputs to the algorithms. This can best be compared to those students who are determinate and derive their lessons totally on their own.

ANNs may be applied to control problems, where the input variables are measurements used to drive an output actuator, and the network learns the control function. The advantage of ANNs lies in their resilience against distortions in the input data and their capability of learning. They are often good at solving problems that are too complex for conventional technologies (problems that may not have an algorithmic solution or problems for which an algorithmic solution is too complex to be detected). ANNs are often well suited to problems that people are good at solving, but for which traditional methods are not.

No limits(?)

ANNs applications have been described as almost limitless but their functions fall into a few simple categories: classification, modelling and forecasting. .

Classification: ANNs can be utilised for market profiling, client profiling, medical diagnosis, risk evaluation, signature verification, voice recognition (anyone tried IBM's VoiceType or DragonDictate on their PC yet?), and image recognition. With ANNs being applied to Modelling we must not forget the growing number of interactive visual modelling packages which utilise ANNs to depict process control, systems control, and dynamic systems simulations. ANNs can provide resources which make it easy to represent the way things really work, while additional detail such as timing or routing information can be added if required, and ANN generated models can account for fluctuation in work speed and availability of resources.
ANNs can also be utilised for the prediction of future sales, for the prediction of production requirements, the estimation of market performance and energy needs, weather forecasting, climate effects on agricultural yields and even for predicting the most likely winner of horse races!

Who needs ANNs?

OR practitioners, scientists or students, working with or analysing data of any kind, in business, education, finance, science fields and industry where problems are likely to be complex. Or anyone working with situations where problem evaluation is laborious, unclear (I hate the fuzzy word), or un-resolvable using conventional methods, will find ANNs can provide significant tools in support of Operational Research's traditionally pragmatic approach and clear understanding of complex problems.

Glossary of common terms

Adaptive Resonance Theory:- A two-layer neural net architecture in which information reverberates back and forth between the layers.
A multilayer feedforward neural net architecture which uses the supervised mode of learning. This is the most widely used type of neural network.
Hidden Layer:-
A layer of processing elements between a neural network's input layer and its output layer.
Input Layer:-
A layer of processing elements that receives the input to a neural net.
Kohonen Net:-
A neural net architecture whose processing elements compete with each other for the right to respond to an input pattern.
Neural Network:-
A system modeled after the neurons (nerve cells) in a biological nervous system.
Output Layer:-
The layer of processing elements which produce a neural network's output.
The process where the properties of a superior object class are pushed upon a subordinate object class.
A software structure which represents an identifiable item that has a well-defined role in a problem domain.
Object Oriented:-
An adjective applied to any system or language that supports the use of objects.
Optical Character Recognition and Character Recognition:-
The process of applying pattern-matching methods to character shapes that have been read into a computer to determine the character that the shapes represent.

Suggested reading

  • Swingler K. (1996) Applying neural networks. A Practical Guide. Pub Academic Press, NY. ISBN 0-12-679170-8.
  • Zupan J, Gasteiger J (1993) Neural networks for Chemists: An Introduction. Pub VCH Verlagsgesellscaft mbH, Weinheim, FRG. ISBN 3-527-28603-9.
  • Burns JA, Whitesides GM (1993) Feed forward neural networks in chemistry : Mathematical systems for classification and pattern recognition. Chem Rev 93 : 2583-2601.
  • Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan, p. 2:
  • Kohonen T (1988) Self-organisation and associative memory. Published by Springer-Verlag, Berlin
  • Rumelhart DE and Mcclelland JL(1986) Parallel distributed processing: Explorations in the microstructure of cognition.; MIT Press, Cambridge,; Vols I and II

With extracts and adaptations from; Artificial Neural Networks - Facts Patterns and Principles, Nigel Cummings (1992)

First published to members of the Operational Research Society in Inside O.R. November 1997