Title:
|
Feature extraction in classification
|
Feature extraction, or dimensionality reduction, is an essential part of many machine learning applications. The necessity for feature extraction stems from the curse of dimensionality and the high computational cost of manipulating high-dimensional data. In this thesis we focus on feature extraction for classification. There are several approaches, and we will focus on two such: the increasingly popular information-theoretic approach, and the classical distance-based, or variance-based approach. Current algorithms for information-theoretic feature extraction are usually iterative. In contrast, PCA and LDA are popular examples of feature extraction techniques that can be solved by eigendecomposition, and do not require an iterative procedure. We study the behaviour of an example of iterative algorithm that maximises Kapur's quadratic mutual information by gradient ascent, and propose a new estimate of mutual information that can be maximised by closed-form eigendecomposition. This new technique is more computationally efficient than iterative algorithms, and its behaviour is more reliable and predictable than gradient ascent. Using a general framework of eigendecomposition-based feature extraction, we show a connection between information-theoretic and distance-based feature extraction. Using the distance-based approach, we study the effects of high input dimensionality and over-fitting on feature extraction, and propose a family of eigendecomposition-based algorithms that can solve this problem. We investigate the relationship between class-discrimination and over-fitting, and show why the advantages of information-theoretic feature extraction become less relevant in high-dimensional spaces.
|