Use this URL to cite or link to this record in EThOS:
Title: Feature selection and classification of non-traditional data : examples from veterinary medicine
Author: Hoare, Zoe Susannah Jane
ISNI:       0000 0001 3579 4347
Awarding Body: University of Wales, Bangor
Current Institution: Bangor University
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Access from Institution:
Early diagnosis of notifiable diseases in the veterinary domain is important with regard to agriculture, the health sector and the economy. With no diagnostic test in the live animal for either BSE or Scrapie many cases may be mis-diagnosed. Traditionally, data for pattern recognition is stored as recorded cases of interest either labelled with their outcome (suitable for supervised classification) or unlabelled. Each case is described by a collection of symptoms, recorded as present / absent. These are called "binary features". In the case of medical data, the amount of cases recorded in this way may be limited for many reasons. To overcome this lack of data expert-estimated probability tables have been proposed as a substitute. These "non-traditional" tables contain the estimated percentage frequencies of clinical symptoms in various diseases. The construction of the tables assumed that the clinical signs (features) were independent given the diseases (classes). Given the "non-traditional" data, various feature selection techniques were applied and compared in this study in order to select a reduced subset of features (symptoms). The potential, limitations and stability of Sequential Forward Selection (SFS) in particular, were investigated. Decision trees and Naive Bayes classifier models were applied for the diagnosis task. The apparent success and stability of Naive Bayes in the medical domain led to an indepth investigation of the effects of this type of data and its inherent assumptions on the model. Naive Bayes is known to be optimal in the case of independent features, which is the condition assumed by the estimated probability tables in the "non-traditional" data. Various proposed adaptations to the Naive Bayes model were investigated with regard to their optimality when the independence assumption is violated. Finally, the performance of Naive Bayes with regard to traditionally stored medical data with binary features was assessed. Naive Bayes and its adaptations performed well with the traditional data. Since the effect of assuming independence when it is not true is minimal, using the "non-traditional" data with the Naive Bayes classifier can be a practical solution for veterinary diagnosis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available