Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.675908
Title: New approaches to modern statistical classification problems
Author: Cannings, Timothy Ivor
ISNI:       0000 0004 5372 132X
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2015
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
This thesis concerns the development and mathematical analysis of statistical procedures for classification problems. In supervised classification, the practitioner is presented with the task of assigning an object to one of two or more classes, based on a number of labelled observations from each class. With modern technological advances, vast amounts of data can be collected routinely, which creates both new challenges and opportunities for statisticians. After introducing the topic and reviewing the existing literature in Chapter 1, we investigate two of the main issues to arise in recent times. In Chapter 2 we introduce a very general method for high-dimensional classification, based on careful combination of the results of applying an arbitrary base classifier on random projections of the feature vectors into a lower-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks, and within each block we select the projection yielding the smallest estimate of the test error. Our random projection ensemble classifier then aggregates the results after applying the chosen projections, with a data-driven voting threshold to determine the final assignment. We derive bounds on the test error of a generic version of the ensemble as the number of projections increases. Moreover, under a low-dimensional boundary assumption, we show that the test error can be controlled by terms that do not depend on the original data dimension. The classifier is compared empirically with several other popular classifiers via an extensive simulation study, which reveals its excellent finite-sample performance. Chapter 3 focuses on the k-nearest neighbour classifier. We first derive a new global asymptotic expansion for its excess risk, which elucidates conditions under which the dominant contribution to the risk comes from the locus of points at which each class label is equally likely to occur, as well as situations where the dominant contribution comes from the tails of the marginal distribution of the features. The results motivate an improvement to the k-nearest neighbour classifier in semi-supervised settings. Our proposal allows k to depend on an estimate of the marginal density of the features based on the unlabelled training data, using fewer neighbours when the estimated density at the test point is small. We show that the proposed semi-supervised classifier achieves a better balance in terms of the asymptotic local bias-variance trade-off. We also demonstrate the improvement in terms of finite-sample performance of the tail adaptive classifier over the standard classifier via a simulation study.
Supervisor: Not available Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.675908  DOI: Not available
Share: