Use this URL to cite or link to this record in EThOS:
Title: Combining rough and fuzzy sets for feature selection
Author: Jensen, Richard
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2004
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Feature selection (FS) refers to the problem of selecting those input attributes that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition and signal processing. Unlike other dimensionality reduction methods, feature selectors preserve the original meaning of the features after reduction. This has found application in tasks that involve datasets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and web content classification. FS techniques have also been applied to small and medium-sized datasets in order to locate the most informative features for later use. Many feature selection methods have been developed and are reviewed critically in this thesis, with particular emphasis on their current limitations. The leading methods in this field are presented in a consistent algorithmic framework. One of the many successful applications of rough set theory has been to this area. The rough set ideology of using only the supplied data and no other information has many benefits in FS, where most other methods require supplementary knowledge. However, the main limitation of rough set-based feature selection in the literature is the restrictive requirement that all data is discrete. In classical rough set theory, it is not possible to consider real-valued or noisy data. This thesis proposes and develops an approach based on fuzzy-rough sets, fuzzy rough feature selection (FRFS), that addresses these problems and retains dataset semantics. Complexity analysis of the underlying algorithms is included. FRFS is applied to two domains where a feature reducing step is important; namely, web content classification and complex systems monitoring. The utility of this approach is demonstrated and is compared empirically with several dimensionality reducers. In the experimental studies, FRFS is shown to equal or improve classification accuracy when compared to the results from unreduced data. Classifiers that use a lower dimensional set of attributes which are retained by fuzzy-rough reduction outperform those that employ more attributes returned by the existing crisp rough reduction method. In addition, it is shown that FRFS is more powerful than the other FS techniques in the comparative study. Based on the new fuzzy-rough measure of feature significance^ further developĀ­ment of the FRFS technique is presented in this thesis. This is developed from the new area of feature grouping that considers the selection of groups of attributes in the search for the best subset. A novel framework is also given for the application of ant-based search mechanisms within feature selection in general, with particular emphasis on its employment in FRFS. Both of these developments are employed and evaluated within the complex systems monitoring application.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available