Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500005
Title: Extending K-Means clustering for analysis of quantitative structure activity relationships (QSAR)
Author: Stanforth, Robert William
Awarding Body: Birkbeck (University of London)
Current Institution: Birkbeck (University of London)
Date of Award: 2008
Availability of Full Text:
Access from EThOS:
Abstract:
A Quantitative Structure-Activity Relationship (QSAR) study is an attempt to model some biological activity over a collection of chemical compounds in terms of their structural properties A QSAR model may be constructed through (typically linear) multivariate regression analysis of the biological activity data against a number of features or 'descriptors' of chemical structure. As with any regression model, there are a number of issues emerging in real applications, including (a) domain of applicability of the model, (b) validation of the model within its domain of applicability, and (c) possible non-linearity of the QSAR Unfortunately the existing methods commonly used in QSAR for overcoming these issues all suffer from problems such as computational inefficiency and poor treatment of non- linearity. In practice this often results in the omission of proper analysis of them altogether. In this thesis we develop methods for tackling the issues listed above using K-means clustering. Specifically, we model the shape of a dataset in terms of intelligent K-means clustering results and use this to develop a non- parametric estimate for the domain of applicability of a QSAR model. Next we propose a 'hybrid' variant of K-means, incorporating a regression-wise element, which engenders a technique for non-linear QSAR modelling. Finally we demonstrate how to partition a dataset into training and testing subsets, using the K-means clustering to ensure that the partitioning respects the overall distribution Our experiments involving real QSAR data confirm the effectiveness of the methods developed in the project.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.500005  DOI: Not available
Share: