Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗
- Author: Mehmed Kantardzic
Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic
The network is feedforward if the processing propagates from the input side to the output side unanimously, without any loops or feedbacks. In a layered representation of the feedforward neural network, there are no links between nodes in the same layer; outputs of nodes in a specific layer are always connected as inputs to nodes in succeeding layers. This representation is preferred because of its modularity, that is, nodes in the same layer have the same functionality or generate the same level of abstraction about input vectors. If there is a feedback link that forms a circular path in a network (usually with a delay element as a synchronization component), then the network is recurrent. Examples of ANNs belonging to both classes are given in Figure 7.3.
Figure 7.3. Typical architectures of artificial neural networks. (a) Feedforward network; (b) recurrent network.
Although many neural-network models have been proposed in both classes, the multilayer feedforward network with a backpropagation-learning mechanism is the most widely used model in terms of practical applications. Probably over 90% of commercial and industrial applications are based on this model. Why multilayered networks? A simple example will show the basic differences in application requirements between single-layer and multilayer networks.
The simplest and well-known classification problem, very often used as an illustration in the neural-network literature, is the exclusive-OR (XOR) problem. The task is to classify a binary input vector X to class 0 if the vector has an even number of 1’s or otherwise assign it to class 1. The XOR problem is not linearly separable; this can easily be observed from the plot in Figure 7.4 for a two-dimensional (2-D) input vector X = {x1, x2}. There is no possibility of obtaining a single linear separation of points that belong to different classes. In other words, we cannot use a single-layer network to construct a straight line (in general, it is a linear hyperplane in an n-dimensional space) to partition the 2-D input space into two regions, each containing data points of only the same class. It is possible to solve the problem with a two-layer network, as illustrated in Figure 7.5, in which one possible solution for the connection weights and thresholds is indicated. This network generates a nonlinear separation of points in a 2-D space.
Figure 7.4. XOR problem.
Figure 7.5. XOR solution: the two-layer ANN with the hard-limit activation function.
The basic conclusion from this example is that single-layered ANNs are a convenient modeling tool only for relatively simple problems that are based on linear models. For most real-world problems, where models are highly nonlinear, multilayered networks are better and maybe the only solution.
7.3 LEARNING PROCESS
A major task for an ANN is to learn a model of the world (environment) in which it is embedded and to maintain the model sufficiently consistent with the real world so as to achieve the specified goals of the concerned application. The learning process is based on data samples from the real world, and here lies a fundamental difference between the design of an ANN and a classical information-processing system. In the latter case, we usually proceed by first formulating a mathematical model of environmental observations, validating the model with real data, and then building (programming) the system on the basis of the model. In contrast, the design of an ANN is based directly on real-life data, with the data set being permitted to “speak for itself.” Thus, an ANN not only provides the implicit model formed through the learning process, but also performs the information-processing function of interest.
The property that is of primary significance for an ANN is the ability of the network to learn from its environment based on real-life examples, and to improve its performance through that learning process. An ANN learns about its environment through an interactive process of adjustments applied to its connection weights. Ideally, the network becomes more knowledgeable about its environment after each iteration in the learning process. It is very difficult to agree on a precise definition of the term learning. In the context of ANNs one possible definition of inductive learning is as follows:
Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters change.
A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Basically, learning algorithms differ from each other in the way in which the adjustment of the weights is formulated. Another factor to be considered in the learning process is the manner in which ANN architecture (nodes and connections) is built.
To illustrate one of the learning rules, consider the simple case of a neuron k, shown in Figure 7.1, constituting the only computational node of the network. Neuron k is driven by input vector X(n), where n denotes discrete time, or, more precisely, the time step of the iterative process involved in adjusting the input weights wki. Every data sample for ANN training (learning) consists of the input vector X(n) and the corresponding output d(n). InputsOutputSamplekxk1, xk2, … , xkmdk
Processing the input vector X(n), a neuron k produces the output that is denoted by yk(n):
It represents the only output of this simple network, and it is compared with a desired response or target output dk(n) given in the sample. An error ek(n) produced at the output is by definition
The error signal produced actuates a control mechanism of the learning algorithm, the purpose of which is to apply a sequence of corrective adjustments to the input weights of a neuron. The corrective adjustments are designed to make the output signal yk(n) come closer to the desired response dk(n) in a step-by-step manner. This objective is achieved by minimizing a cost function E(n), which is the instantaneous value of error energy, defined for this
Comments (0)