Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗
- Author: Mehmed Kantardzic
Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic
We can define inductive learning as the process of estimating an unknown input–output dependency or structure of a system, using limited number of observations or measurements of inputs and outputs of the system. In the theory of inductive learning, all data in a learning process are organized, and each instance of input–output pairs we use a simple term known as a sample. The general learning scenario involves three components, represented in Figure 4.2.
1. a generator of random input vectors X,
2. a system that returns an output Y for a given input vector X, and
3. a learning machine that estimates an unknown (input X, output Y″) mapping of the system from the observed (input X, output Y) samples.
Figure 4.2. A learning machine uses observations of the system to form an approximation of its output.
This formulation is very general and describes many practical, inductive-learning problems such as interpolation, regression, classification, clustering, and density estimation. The generator produces a random vector X, which is drawn independently from any distribution. In statistical terminology, this situation is called an observational setting. It differs from the designed-experiment setting, which involves creating a deterministic sampling scheme, optimal for a specific analysis according to experimental design theory. The learning machine has no control over which input values were supplied to the system, and therefore, we are talking about an observational approach in inductive machine-learning systems.
The second component of the inductive-learning model is the system that produces an output value Y for every input vector X according to the conditional probability p(Y/X), which is unknown. Note that this description includes the specific case of a deterministic system where Y = f(X). Real-world systems rarely have truly random outputs; however, they often have unmeasured inputs. Statistically, the effects of these unobserved inputs on the output of the system can be characterized as random and represented with a probability distribution.
An inductive-learning machine tries to form generalizations from particular, true facts, which we call the training data set. These generalizations are formalized as a set of functions that approximate a system’s behavior. This is an inherently difficult problem, and its solution requires a priori knowledge in addition to data. All inductive-learning methods use a priori knowledge in the form of the selected class of approximating functions of a learning machine. In the most general case, the learning machine is capable of implementing a set of functions f(X, w), w ∈ W, where X is an input, w is a parameter of the function, and W is a set of abstract parameters used only to index the set of functions. In this formulation, the set of functions implemented by the learning machine can be any set of functions. Ideally, the choice of a set of approximating functions reflects a priori knowledge about the system and its unknown dependencies. However, in practice, because of the complex and often informal nature of a priori knowledge, specifying such approximating functions may be, in many cases, difficult or impossible.
To explain the selection of approximating functions, we can use a graphical interpretation of the inductive-learning process. The task of inductive inference is this: Given a collection of samples (xi, f[xi]), return a function h(x) that approximates f(x). The function h(x) is often called a hypothesis. Figure 4.3 shows a simple example of this task, where the points in 2-D are given in Figure 4.3a, and it is necessary to find “the best” function through these points. The true f(x) is unknown, so there are many choices for h(x). Without more knowledge, we have no way of knowing which one to prefer among three suggested solutions (Fig. 4.3b,c,d). Because there are almost always a large number of possible, consistent hypotheses, all learning algorithms search through the solution space based on given criteria. For example, the criterion may be a linear approximating function that has a minimum distance from all given data points. This a priori knowledge will restrict the search space to the functions in the form given in Figure 4.3b.
Figure 4.3. Three hypotheses for a given data set.
There is also an important distinction between the two types of approximating functions we usually use in an inductive-learning process. Their parameters could be linear or nonlinear. Note that the notion of linearity is with respect to parameters rather than input variables. For example, polynomial regression in the form
is a linear method, because the parameters wi in the function are linear (even if the function by itself is nonlinear). We will see later that some learning methods such as multilayer, artificial, neural networks provide an example of nonlinear parametrization, since the output of an approximating function depends nonlinearly on parameters. A typical factor in these functions is e−ax, where a is a parameter and x is the input value. Selecting the approximating functions f(X, w) and estimating the values of parameters w are typical steps for every inductive-learning method.
Before further formalization of a learning process, it is necessary to make a clear distinction between two concepts that are highly connected with a learning process. Let us discuss the differences between statistical dependency and causality. The statistical dependency between input X and output Y is expressed with the approximating functions of the learning method. The main point is that causality cannot be inferred from data analysis alone and concluded with some inductive, learned model using input–output approximating functions, Y = f(X, w); instead, it must be assumed or demonstrated by arguments outside the results of inductive-learning analysis. For example, it is well known that people in Florida are on average older than in the rest of the United States. This observation may be supported by inductive-learning dependencies, but it does not imply, however, that the climate in Florida causes people to live longer. The cause is totally different; people just move there when they retire and that is possibly the cause,
Comments (0)