Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗
- Author: Mehmed Kantardzic
Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic
That was the final processing step in the first iteration of the GA. The results, in the form of the new population CR1′ to CR4′, are used in the next iteration, which starts again with an evaluation process.
13.3.6 Evaluation (Second Iteration)
The process of evaluation is repeated in the new population. These results are given in Table 13.3.
TABLE 13.3. Evaluation of the Second Generation of Chromosomes
The process of optimization with additional iterations of the GA can be continued in accordance with Figure 13.3. We will stop here with a presentation of computational steps for our example, and give some additional analyses of results useful for a deeper understanding of a GA.
Although the search techniques used in the GA are based on many random parameters, they are capable of achieving a better solution by exploiting the best alternatives in each population. A comparison of sums, and average and max values from Tables 13.2 and 13.3
shows that the new, second population is approaching closer to the maximum of the function f(x). The best result obtained from the evaluation of chromosomes in the first two iterations is the chromosome CR3′ = 11011 and it corresponds to the feature’s value x = 27 (theoretically it is known that the maximum of f(x) is for x = 31, where f(x) reaches the value 961). This increase will not be obtained in each GA iteration, but on average, the final population is much closer to a solution after a large number of iterations. The number of iterations is one possible stopping criterion for the GA algorithm. The other possibilities for stopping the GA are when the difference between the sums in two successive iterations is less than the given threshold, when a suitable fitness value is achieved, and when the computation time is limited.
13.4 SCHEMATA
The theoretical foundations of GAs rely on a binary-string representation of solutions, and on the notation of a schema—a template allowing exploration of similarities among chromosomes. To introduce the concept of a schema, we have to first formalize some related terms. The search space Ω is the complete set of possible chromosomes or strings. In a fixed-length string l, where each bit (gene) can take on a value in the alphabet A of size k, the resulting size of the search space is kl. For example, in binary-coded strings where the length of the string is 8, the size of the search space is 28 = 256. A string in the population S is denoted by a vector x ∈ Ω. So, in the previously described example, x would be an element of {0, 1}8. A schema is a similarity template that defines a subset of strings with fixed values in certain positions.
A schema is built by introducing a don’t care symbol (*) into the alphabet of genes. Each position in the scheme can take on the values of the alphabet (fixed positions) or a “don’t care” symbol. In the binary case, for example, the schemata of the length l are defined as H ∈ {0, 1, *}l. A schema represents all the strings that match it on all positions other than “*”. In other words, a schema defines a subset of the search space, or a hyperplane partition of this search space. For example, let us consider the strings and schemata of the length ten. The schema
matches two strings
and the schema
matches four strings
Of course, the schema
represents one string only, and the schema
represents all strings of length 10. In general, the total number of possible schemata is (k + 1)l, where k is the number of symbols in the alphabet and l is the length of the string. In the binary example of coding strings, with a length of 10, it is (2+1)10 = 310 = 59049 different strings. It is clear that every binary schema matches exactly 2r strings, where r is the number of don’t care symbols in a schema template. On the other hand, each string of length m is matched by 2m different schemata.
We can graphically illustrate the representation of different schemata for five-bit codes used to optimize the function f(x) = x 2 on interval [0, 31]. Every schema represents a subspace in the 2-D space of the problem. For example, the schema 1**** reduces the search space of the solutions on the subspace given in Figure 13.5a, and the schema 1*0** has a corresponding search space in Figure 13.5b.
Figure 13.5. f(x) = x2: Search spaces for different schemata. (a) Schema 1****; (b) Schema 1*0**.
Different schemata have different characteristics. There are three important schema properties: order (O), length (L), and fitness (F). The order of the schema S denoted by O(S) is the number of 0 and 1 positions, that is, fixed positions presented in the schema. A computation of the parameter is very simple: It is the length of the template minus the number of don’t care symbols. For example, the following three schemata, each of length 10
have the following orders:
The schema S3 is the most specific one and the schema S2 is the most general one. The notation of the order of a schema is useful in calculating survival probabilities of the schema for mutations.
The length of the schema S, denoted by L(S), is the distance between the first and the last fixed-string positions. It defines the compactness of information contained in a schema. For example, the values of this parameter for the given schemata S1 to S3 are
Note that the schema with a single fixed position has a length of 0. The length L of a schema is a useful parameter in calculating the survival probabilities of the schema for crossover.
Another property
Comments (0)