Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗
- Author: Mehmed Kantardzic
Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic
Before going into details of the applications of GAs in the following sections, let us understand their basic principles and components. GAs encode each point in a parameter or solution space into a binary-bit string called a chromosome. These points in an n-dimensional space do not represent samples in the terms that we defined them at the beginning of this book. While samples in other data-mining methodologies are data sets given in advance for training and testing, sets of n-dimensional points in GAs are a part of a GA, and they are produced iteratively in the optimization process. Each point or binary string represents a potential solution to the problem that is to be solved. In GAs, the decision variables of an optimization problem are coded by a structure of one or more strings, which are analogous to chromosomes in natural genetic systems. The coding strings are composed of features that are analogous to genes. Features are located in different positions in the string, where each feature has its own position (locus) and a definite allele value, which complies with the proposed coding method. The string structures in the chromosomes go through different operations similar to the natural-evolution process to produce better alternative solutions. The quality of new chromosomes is estimated based on the “fitness” value, which can be considered as the objective function for the optimization problem. The basic relations between concepts in natural evolution and GAs are given in Table 13.1. Instead of single a point, GAs usually keep a set of points as a population, which is then evolved repeatedly toward a better overall fitness value. In each generation, the GA constructs a new population using genetic operators such as crossover and mutation. Members with higher fitness values are more likely to survive and participate in mating or crossover operations.
TABLE 13.1. Basic Concepts in Genetic AlgorithmsConcept in Natural EvolutionConcept in Genetic AlgorithmsChromosomeStringGeneFeatures in the stringLocusPosition in the stringAllelePosition value (usually 0 or 1)GenotypeString structurePhenotypeSet of characteristics (features)
As a general-purpose optimization tool, GAs are moving out of academia and finding significant applications in many other venues. Typical situations where GAs are particularly useful are in difficult optimization cases for which analytical methods do not work well. GAs have been quite successfully applied to optimization problems like wire routing, scheduling, adaptive control, game playing, transportation problems, TSPs, database query optimization, and machine learning. In the last decades, the significance of optimization has grown even further because many important large-scale, combinatorial-optimization problems and highly constrained engineering problems can only be solved approximately. GAs aim at such complex problems. They belong to the class of probabilistic algorithms, yet they are very different from random algorithms as they combine elements of directed and stochastic search. Another important property of genetic-based search methods is that they maintain a population of potential solutions while all other methods process a single point of the search space. Because of these characteristics, GAs are more robust than existing directed-search methods.
GAs are popular because they do not depend on functional derivatives, and they have the following characteristics:
1. GAs are parallel-search procedures that can be implemented on parallel-processing machines to massively speed up their operations.
2. GAs are applicable to both continuous- and discrete-optimization problems.
3. GAs are stochastic and less likely to get trapped in local minima, which inevitably are present in any practical optimization application.
4. GAs’ flexibility facilitates both structure and parameter identification in complex models.
The GA theory provides some explanations of why, for a given problem formulation, we may obtain convergence to the sought optimal point. Unfortunately, practical applications do not always follow the theory, the main reasons being:
1. the coding of the problem often moves the GA to operate in a different space than that of the problem itself;
2. there are practical limits on the hypothetically unlimited number of iterations (generations in the GA); and
3. there is a limit on the hypothetically unlimited population size.
One of the implications of these observations is the inability of GAs, under certain conditions, to find the optimal solution or even an approximation to the optimal solution; such failures are usually caused by premature convergence to a local optimum. Do not forget that this problem is common not only for the other optimization algorithms but also for the other data-mining techniques.
13.2 OPTIMIZATION USING GAs
Let us note first that without any loss of generality we can assume that all optimization problems can be analyzed as maximization problems only. If the optimization problem is to minimize a function f(x), this is equivalent to maximizing a function g(x) = −f(x). Moreover, we may assume that the objective function f(x) takes positive values in its domain. Otherwise, we can translate the function for some positive constant C so that it will be always positive, that is,
If each variable xi, with real values, is coded as a binary string of length m, then the relation between the initial value and the coded information is
where the variable xi can take the values from a domain Di = [a, b], and m is the smallest integer such that the binary code has the required precision. For example, the value for variable x given on the domain [10, 20] is a binary-coded string with the length equal to 3 and the code 100. While the range of codes is between 000 and 111, the question is: What is the real value of the coded variable x? For this example, m = 3 and the corresponding precision is
and that is the difference between two successive xi values that could
Comments (0)