bookssland.com » Other » Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗

Book online «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗». Author Mehmed Kantardzic



1 ... 19 20 21 22 23 24 25 26 27 ... 193
Go to page:
the domains for all attributes are [0, 1, 2], what will be the number of “artificial” samples if missing values are interpreted as “don’t care values” and they are replaced with all possible values for a given domain?

9. A 24-h, time-dependent data set X is collected as a training data set to predict values 3 h in advance. If the data set X is

(a) What will be a standard tabular representation of data set X if

(i) the window width is 6, and a prediction variable is based on the difference between the current value and the value after 3 h. What is the number of samples?

(ii) the window 4width is 12, and the prediction variable is based on ratio. What is the number of samples?

(b) Plot discrete X values together with computed 6- and 12-h MA.

(c) Plot time-dependent variable X and its 4-h EMA.

10. The number of children for different patients in a database is given with a vector

Find the outliers in set C using standard statistical parameters mean and variance.

If the threshold value is changed from ±3 standard deviations to ±2 standard deviations, what additional outliers are found?

11. For a given data set X of 3-D samples,

(a) find the outliers using the distance-based technique if

(i) the threshold distance is 4, and threshold fraction p for non-neighbor samples is 3, and

(ii) the threshold distance is 6, and threshold fraction p for non-neighbor samples is 2.

(b) Describe the procedure and interpret the results of outlier detection based on mean values and variances for each dimension separately.

12. Discuss the applications in which you would prefer to use EMA instead of MA.

13. If your data set contains missing values, discuss the basic analyses and corresponding decisions you will take in the preprocessing phase of the data-mining process.

14. Develop a software tool for the detection of outliers if the data for preprocessing are given in the form of a flat file with n-dimensional samples.

15. The set of seven 2-D samples is given in the following table. Check if we have outliers in the data set. Explain and discuss your answer.Sample #XY113271324463542622772

16. Given the data set of 10 3-D samples: {(1,2,0), (3,1,4), (2,1,5), (0,1,6), (2,4,3), (4,4,2), (5,2,1), (7,7,7), (0,0,0), (3,3,3)}, is the sample S4 = (0,1,6) outlier if the threshold values for the distance d = 6, and for the number of samples in the neighborhood p > 2? (Note: Use distance-based outlier-detection technique.)

17. What is the difference between nominal and ordinal data? Give examples.

18. Using the method of distance-based outliers detection find the outliers in the set

if the criterion is that at least the fraction p ≥ 3 of the samples in X lies at a distance d greater than 4.

19. What will be normalized values (using min-max normalization of data for the range [−1, 1]) for the data set X?

20. Every attribute in 6-D samples is described with one out of three numerical values: {0, 0.5, 1}. If there exist samples for all possible combinations of attribute values

(a) What will be the number of samples in a data set, and

(b) What will be the expected distance between points in a 6-D space?

21. Classify the following attributes as binary, discrete, or continuous. Also, classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briefly indicate your reasoning (e.g., age in years; answer: discrete, quantitative, ratio).

(a) Time in terms of AM or PM.

(b) Brightness as measured by a light meter.

(c) Brightness as measured by people’s judgment.

(d) Angles as measured in degrees between 0 and 360.

(e) Bronze, Silver, and Gold medals at the Olympics.

(f) Height above sea level.

(g) Number of patients in a hospital.

(h) ISBN numbers for books.

(i) Ability to pass light in terms of the following values: opaque, translucent, transparent.

(j) Military rank.

(k) Distance from the center of campus.

(l) Density of a substance in grams per cubic centimeter.

(m) Coats check number when you attend the event.

2.8 REFERENCES FOR FURTHER STUDY

Bischoff, J., T. Alexander, Data Warehouse: Practical Advice from the Experts, Prentice Hall, Upper Saddle River, NJ, 1997.

The objective of a data warehouse is to provide any data, anywhere, anytime in a timely manner at a reasonable cost. Different techniques used to preprocess the data in warehouses reduce the effort in initial phases of data mining.

Cios, K.J., W. Pedrycz, R. W. Swiniarski, L. A. Kurgan, Data Mining: A Knowledge Discovery Approach, Springer, New York, 2007.

This comprehensive textbook on data mining details the unique steps of the knowledge discovery process that prescribe the sequence in which data-mining projects should be performed. Data Mining offers an authoritative treatment of all development phases from problem and data understanding through data preprocessing to deployment of the results. This knowledge-discovery approach is what distinguishes this book from other texts in the area. It concentrates on data preparation, clustering and association-rule learning (required for processing unsupervised data), decision trees, rule induction algorithms, neural networks, and many other data-mining methods, focusing predominantly on those which have proven successful in data-mining projects.

Hand, D., H. Mannila, P. Smith, Principles of Data Mining, MIT Press, Cambridge, MA, 2001.

The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data-mining algorithms and their applications. The second section, data-mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The third section shows how all of the preceding analyses fit together when applied to real-world data-mining problems.

Hodge, J. V., J. Austin, A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, Vol. 22, No. 2, October 2004, pp. 85–127.

The original outlier-detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, a survey of contemporary techniques for outlier detection is introduced. The authors identify respective motivations and distinguish advantages and disadvantages of these techniques in a comparative review.

Ben-Gal, I., Outlier Detection, in Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers,

1 ... 19 20 21 22 23 24 25 26 27 ... 193
Go to page:

Free e-book «Data Mining by Mehmed Kantardzic (inspirational novels TXT) 📗» - read online now

Comments (0)

There are no comments yet. You can be the first!
Add a comment