Use this URL to cite or link to this record in EThOS:
Title: Transformation techniques in data mining
Author: Burgess, Martin.
ISNI:       0000 0001 3509 675X
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2004
Availability of Full Text:
Access through EThOS:
Transforming data is essential within data mining as a precursor to many applications such as rule induction and Multivariate Adaptive Regression Splines. The problems arising from the use of categorical valued data in rule induction are reduced confidence (accuracy), support and coverage. We introduce a technique called arcsin transformation where categorical valued data is replaced with numeric values. This technique has been used on a number of databases and has shown to be highly effective. Multivariate Adaptive Regression Splines, MARS, is a regression tool which attempts to approximate complex relationships by a series of linear regressions on different intervals of the explanatory variable ranges. Like regression methods in general, we need to know what assumptions are made and how the violation of these may disrupt performance. The two key assumptions with most regression models including MARS are additivity of effects and homoscedasticity. If any of these assumptions are not satisfied in terms of the original observations, y;, a non-linear transformation may improve matters. We use the Box-Cox transformation in which the continuous dependent variable (with non-negative responses) in a linear regression setting, might induce the regression assumptions given previously. The assumptions stated are discussed in detail using a variety of tests. The results show that on seven databases examined, an improvement has been made on six, where the models produced were
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: null Data mining Statistical methods. Regression analysis.