Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.701521
Title: Some problems in the theory and application of the methods of numerical taxonomy
Author: Wishart, David
Awarding Body: University of St Andrews
Current Institution: University of St Andrews
Date of Award: 1970
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Several of the methods of numerical taxonomy are compared and shown to be variants of a tripartite grouping procedure associated with a generalised intercluster similarity function involving ten computational parameters. Clustering by the techniques of hierarchic fusion, monothetic division and iterative relocation is obtained using different arithmetic combinations of the function parameters to both compute similarities and effect changes in cluster membership. The combinatorial solution for Ward's method is found, and the centroid sorting combinatorial solution is extended for size difference, shape difference, dispersion and dot product coefficients. It is suggested that clusters are characterised more by the choice of similarity criterion than by the choice of method, and it is demonstrated that some common criteria such as distance and the error sum of squares are inclined to force spherical 'minimum-variance' classes. These are contrasted by 'natural' classes, which correspond to closed density surfaces defined for a multi-variate sample space by the underlying probability density function. A method for mode-seeking is developed from this probabilistic model through various theoretical and experimental phases, and it is shown to perform slightly better than iterative relocation with the minimum-variance criteria using several Gaussian test populations. A fast algorithm is proposed for the solution of the Jardine-Sibson method for generating overlapping classes, and it is observed that this technique finds natural classes and is closely related to the probabilistic model. Some aspects of computational procedures are discussed, and in particular, it is proposed that a generalised system involving a statistical language, conversational mode package and program suite could be developed from a basic subroutine system. Paging and simulation techniques for the organisation of direct-access data files are suggested, and a comprehensive package of computer programs for cluster analysis is described.
Supervisor: Cole, Alfred John Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.701521  DOI: Not available
Keywords: QA278.W5 ; Numerical taxonomy
Share: