Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗
- Author: Mehmed Kantardzic
Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic
2. Local approach is based on learner selection where one or more learners responsible for generating the output are selected based on their closeness to the sample. Selection function is applied where for each pattern, just one classifier, or a subset, is responsible for the final decision.
3. Multistage combination uses a serial approach where the next learner is trained with or tested on only instances where previous learners were inaccurate.
Voting is the simplest way of combining classifiers on a global level, and representing the result as a linear combination of outputs dj for n learners:
The result of the combination could be different depending on wj. Alternatives for combinations are simple sum (equal weights), weighted sum, median, minimum, maximum, and product of dij. Voting schemes can be seen as approximations under a Bayesian framework where weights wj approximate prior model probabilities.
Rank-level Fusion Method is applied for some classifiers that provide class “scores,” or some sort of class probabilities. In general, if Ω = {c1, … , ck} is the set of classes, each of these classifiers can provide an “ordered” (ranked) list of class labels. For example, if probabilities of output classes are 0.10, 0.75, and 0.20, corresponding ranks for the classes will be 1, 3, and 2, respectively. The highest rank is given to the class with the highest probability. Let us check an example, where the number of classifiers is N = 3, and the number of classes k = 4, Ω = {a, b, c, d}. For a given sample, the ranked outputs of the three classifiers are as follows:
In this case, final selection of the output class will be determined by accumulation of scores for each class:
The winner class is b because it has the maximum overall rank.
Finally, the Dynamic Classifier Selection (DCS) algorithm, representing a local approach, assumes the following steps:
1. Find the k nearest training samples to the test input.
2. Look at the accuracies of the base classifiers on these samples.
3. Choose one (or top N) classifiers that performs best on these samples.
4. Combine decisions for selected classifiers.
8.3 BAGGING AND BOOSTING
Bagging and boosting are well-known procedures with solid theoretical background. They belong to the class (d) of ensemble methodologies and essentially they are based on resampling of a training data set.
Bagging, a name derived from bootstrap aggregation, was the first effective method of ensemble learning and is one of the simplest methods. It was originally designed for classification and is usually applied to decision tree models, but it can be used with any type of model for classification or regression. The method uses multiple versions of a training set by using the bootstrap, that is, sampling with replacement. Each of these data sets is used to train a different model. The outputs of the models are combined by averaging (in the case of regression) or voting (in the case of classification) to create a single output.
In the bagging methodology a training data set for a predictive model consists of samples taken with replacement from an initial set of samples according to a sampling distribution. The sampling distribution determines how likely it is that a sample will be selected. For example, when the sampling distribution is predefined as the uniform distribution, all N training samples have the same probability, 1/N, of being selected. In the same training data set, because of replacement sampling, some training samples may appear multiple times, while any training samples may not appear even once. In Figure 8.6, there are five training samples {S1, S2, S3, S4, S5} with four features {F1, F2, F3, F4}. Suppose that three training data sets are formed by samples that are randomly selected with replacement from the training samples according to the uniform distribution. Each training sample has a 1/5 probability of being selected as an element of a training data set. In the training data set 1, S2 and S4 appear twice, while S1 and S3 do not appear.
Figure 8.6. Bagging methodology distributes samples taken with replacement from initial set of samples.
Bagging is only effective when using unstable nonlinear models where small changes in training data lead to significantly different classifiers and large changes in accuracy. It decreases error by decreasing the variance in the results of unstable learners.
Boosting is the most widely used ensemble method and one of the most powerful learning ideas introduced in the ensemble-learning community. Originally designed for classification, it can also be extended to regression. The algorithm first creates a “weak” classifier, that is, it suffices that its accuracy on the training set is slightly better than random guessing. Samples are given initial weights, and usually it starts with uniform weighting. For the following iterations, the samples are reweighted to focus the system on samples that are not correctly classified with a recently learned classifier. During each step of learning: (1) increase weights of the samples that are not correctly learned by the weak learner, and (2) decrease weights of the samples that are correctly learned by the weak learner. Final classification is based on a weighted vote of weak classifiers generated in iterations.
8.4 ADABOOST
The original boosting algorithm combined three weak learners to generate a strong, high quality learner. AdaBoost, short for “adaptive boosting,” is the most popular boosting algorithm. AdaBoost combine “weak” learners into a highly accurate classifier to solve difficult highly nonlinear problems. Instead of sampling, as in a bagging approach, AdaBoost reweighs samples. It uses the same training set over and over again (thus it need not be large) and it may keep adding weak learners until a target training error is reached.
Given a training data set: {(x1, y1), … , (xm, ym)} where xi ∈ X and yi ∈ {−1, +1}, when a weak classifier is trained with the data, for each input sample xi the classifier will give classification h(xi) (where h(xi) ∈ {−1, +1}). With these assumptions the main
Comments (0)