Use this URL to cite or link to this record in EThOS:
Title: Intelligent analysis of small data sets for food design
Author: Corney, David Peter Alfred
ISNI:       0000 0001 3562 8517
Awarding Body: University of London
Current Institution: University College London (University of London)
Date of Award: 2002
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis compares the performance of machine learning techniques and statistics in the analysis of food design data. The goal of the analysis is to understand what makes people like (or dislike) a product, by building models relating sensory features (such as flavour or texture) to consumer preferences. One difficulty in analysing these data sets is that they are extremely small, due to taste-fatigue of consumer preference panels. Feature selection is essential because food sensory data sets typically have many features and few records. Several feature selection algorithms are compared, and the results highlight the need to limit the number of features used. We therefore apply model order selection to feature selection. A semi-supervised feature selection method is introduced and compared with more traditional methods. After the selection of a suitable set of features, the relationship between those features and consumers preferences must be modelled. Two regression techniques are compared, focussing on their relative performance on very small data sets. A semi-supervised ensemble learning algorithm is introduced, and analysed. Consumers have individual preferences, so rather than producing a single generic product, food designers must first discover homogeneous groups of consumers, and then target each group with a different product. Several clustering techniques are compared, and consideration of their inherent biases reveals further information regarding the structure of the data. A combination of regression and clustering is proposed, which allows evaluation of clustering results using the predictive power of the resultant models. Preference data sets contain a significant number of misleading outliers owing to the way they are collected. An algorithm that combines clustering and outlier detection is introduced. Which aims to produce an outlier-free cluster model, and also provides heuristic estimates of the number of outliers present. Overall, machine learning techniques show performance similar to traditional statistical techniques, with small improvements in accuracy in some cases. Machine learning brings the benefit of typically being dependent on fewer assumptions: where these assumptions are invalid, results may be improved. Furthermore, machine learning makes use of considerable computational power, which is now cheaply available, in the search for improved solutions. In this thesis, we examine the efficacy of machine learning techniques when analyzing food design data sets. In summary, the main contributions of this thesis are: A semi-supervised feature selection algorithm. A semi-supervised ensemble for regression. A clustering evaluation technique. An outlier detection technique for clustering.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available