Use this URL to cite or link to this record in EThOS:
Title: Feature selection in statistical pattern recognition
Author: Choakjarernwanit, Naruetep
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 1992
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis addresses the problem of feature selection in pattern recognition. A detailed analysis and an experimental comparison of various search strategies for selecting a feature set of size d from D available measurements are presented. For a realistic problem, optimal search, even if performed using the branch and bound search method, is computationally prohibitive. The alternative is to use suboptimal search methods. Of these, there are four methods, namely the sequential forward selection (SFS), sequential backward selection (SBS), sequential forward floating selection (SFFS), and sequential backward floating selection (SBFS), which are relatively simple and require little computational time. It is suggested that the SFS method should be employed in the case of limited training sample size. Although the decision about including a particular measurements in the SFS method is made on the basis of statistical dependencies among features in spaces of monotonically increasing dimensionality, the approach has proved in practice to be more reliable. This is because the algorithm utilizes at the beginning only less complex mutual relations which using small sample sets are determined more reliably than the statistics required by the SBS method. Because both the SFS and SBS methods suffer from the nesting effect, if better solution is required then the SFFS and SBFS should be employed. As the first of the two main issues of the thesis, the possibility of developing feature selection techniques which rely only on the merit of individual features as well as pairs of features is investigated. This issue is considered very important because the computational advantage of such an algorithm exploiting only at most pairwise interactions of measurements would be very useful for solving feature selection problems of very high dimensionality. For this reason, a potentially very promising search method known as the Max-Min method is investigated. By means of a detailed analysis of the heuristic reasoning behind the method its weaknesses are identified. The first weakness is due to the use of upper limit on the error bound as a measure of effectiveness of a candidate feature. This strategy does not guarantee that selecting a candidate feature with the highest upper bound will yield the highest actual amount of additional information. The second weakness is that the method does not distinguish between a strong unconditional dependence and a poor performance of a feature which both manifest themselves by a near zero additional discriminatory information. Modifications aimed at overcoming the latter by favouring features which exhibit conditional dependence and on the other hand suppressing features which exhibit strong unconditional dependence have been proposed and tested but only with a limited success. For this reason the Max-Min method is subjected to a detailed theoretical analysis. It is found that the key assumption underlying the whole Max-Min algorithm is not justified and the algorithm itself is ill-founded, i.e. the actual increment of the criterion value (or decrease of the probability of error) can be bigger than the minimum of pairwise error probability reductions assumed by the Max-Min method. A necessary condition for invalidity of the key assumption of the Max-Min algorithm is derived, and a counter-example proving the lack of justification for the algorithm is presented. The second main issue of the thesis is the development of a new feature selection method for non-normal class conditional densities. For a given dimensionality the subset of selected features minimizes the Kullback-Leibler distance between the true and postulated class conditional densities. The algorithm is based on approximating unknown class conditional densities by a finite mixture of densities of a special type using the maximum likelihood approach. After the optimization ends, the optimal feature subset of required dimensionality is obtained immediately without the necessity to employ any search procedure. Successful experiments with both simulated and real data are also carried out to validate the proposed method.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available