Use this URL to cite or link to this record in EThOS:
Title: New algorithms and methodology for analysing distances
Author: Kettleborough, George
ISNI:       0000 0004 5346 9646
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
Distances arise in a wide variety of di�erent contexts, one of which is partitional clustering, that is, the problem of �nding groups of similar objects within a set of objects.¿ese groups are seemingly very easy to �nd for humans, but very di�cult to �nd for machines as there are two major di�culties to be overcome: the �rst de�ning an objective criterion for the vague notion of “groups of similar objects”, and the second is the computational complexity of �nding such groups given a criterion. In the �rst part of this thesis, we focus on the �rst di�culty and show that even seemingly similar optimisation criteria used for partitional clustering can produce vastly di�erent results. In the process of showing this we develop a new metric for comparing clustering solutions called the assignment metric. We then prove some new NP-completeness results for problems using two related “sum-of-squares” clustering criteria. Closely related to partitional clustering is the problem of hierarchical clustering. We extend and formalise this problem to the problem of constructing rooted edge-weighted X-trees, that is trees with a leafset X. It is well known that an X-tree can be uniquely reconstructed from a distance on X if the distance is an ultrametric. But in practice the complete distance on X may not always be available. In the second part of this thesis we look at some of the circumstances under which a tree can be uniquely reconstructed from incomplete distance information. We use a concept called a lasso and give some theoretical properties of a special type of lasso. We then develop an algorithm which can construct a tree together with a lasso from partial distance information and show how this can be applied to various incomplete datasets.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available