New approaches to modern statistical classification problems

Cannings, Timothy Ivor

New approaches to modern statistical classification problems

dc.contributor	Samworth, Richard John
dc.creator	Cannings, Timothy Ivor
dc.date.accessioned	2018-11-24T23:26:28Z
dc.date.available	2015-12-04T11:47:13Z
dc.date.available	2018-11-24T23:26:28Z
dc.date.issued	2015-11-10
dc.identifier	https://www.repository.cam.ac.uk/handle/1810/252845
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/123456789/3835
dc.description.abstract	This thesis concerns the development and mathematical analysis of statistical procedures for classification problems. In supervised classification, the practitioner is presented with the task of assigning an object to one of two or more classes, based on a number of labelled observations from each class. With modern technological advances, vast amounts of data can be collected routinely, which creates both new challenges and opportunities for statisticians. After introducing the topic and reviewing the existing literature in Chapter 1, we investigate two of the main issues to arise in recent times. In Chapter 2 we introduce a very general method for high-dimensional classification, based on careful combination of the results of applying an arbitrary base classifier on random projections of the feature vectors into a lower-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks, and within each block we select the projection yielding the smallest estimate of the test error. Our random projection ensemble classifier then aggregates the results after applying the chosen projections, with a data-driven voting threshold to determine the final assignment. We derive bounds on the test error of a generic version of the ensemble as the number of projections increases. Moreover, under a low-dimensional boundary assumption, we show that the test error can be controlled by terms that do not depend on the original data dimension. The classifier is compared empirically with several other popular classifiers via an extensive simulation study, which reveals its excellent finite-sample performance. Chapter 3 focuses on the k-nearest neighbour classifier. We first derive a new global asymptotic expansion for its excess risk, which elucidates conditions under which the dominant contribution to the risk comes from the locus of points at which each class label is equally likely to occur, as well as situations where the dominant contribution comes from the tails of the marginal distribution of the features. The results motivate an improvement to the k-nearest neighbour classifier in semi-supervised settings. Our proposal allows k to depend on an estimate of the marginal density of the features based on the unlabelled training data, using fewer neighbours when the estimated density at the test point is small. We show that the proposed semi-supervised classifier achieves a better balance in terms of the asymptotic local bias-variance trade-off. We also demonstrate the improvement in terms of finite-sample performance of the tail adaptive classifier over the standard classifier via a simulation study.
dc.language	en
dc.publisher	University of Cambridge
dc.publisher	Department of Pure Mathematics and Mathematical Statistics
dc.rights	http://creativecommons.org/licenses/by/2.0/uk/
dc.rights	Attribution 2.0 UK: England & Wales
dc.title	New approaches to modern statistical classification problems
dc.type	Thesis

Files in this item

Files	Size	Format	View
Thesis_Cannings_Final.pdf	1.056Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Department of Pure Mathematics and Mathematical Statistics (DPMMS)248

Show simple item record

New approaches to modern statistical classification problems

Files in this item

This item appears in the following Collection(s)

Department of Pure Mathematics and Mathematical Statistics (DPMMS)248