Use this URL to cite or link to this record in EThOS:
Title: Support vector machines for drug discovery
Author: Trotter, M. W. B.
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Support vector machines (SVMs) have displayed good predictive accuracy on a wide range of classification tasks and are inherently adaptable to complex problem domains. Structure-property correlation (SPC) analysis is a vital part of the contemporary drug discovery process, in which several components of the search for novel molecular compounds with therapeutic potential may be performed by computer (in silicd). Inferred relationships between molecular structure and biological properties of interest are used to eliminate compounds unsuitable for further development. In order to improve process efficiency without rejecting useful compounds, predictive accuracy of such relationships must remain high despite a paucity of data from which to infer them. This thesis describes the application of SVMs to SPC analysis and investigates methods with which to enhance performance and facilitate integration of the technique into present practice. Overviews of contemporary drug discovery and the role of machine learning place the investigation into context. Computational discrimination between compounds according to their structures and properties of interest is described in detail, as is the SVM algorithm. A framework for the assessment of supervised machine learning performance on SPC data is proposed and employed to assess SVM performance alongside state-of-the-art techniques for in silico SPC analysis on data provided by GlaxoSmithKline. SVM performance is competitive and the comparison prompts adaptations of both data treatment and algorithmic application to explore the effects of data paucity, class imbalance and outlying data. Subsequent work weights the SVM kernel matrix to recognise heavily populated regions of training data and suggests the incorporation of domain-specific clustering methods to assist the standard SVM algorithm. The notion that SVM kernel functions may incorporate existing domain-specific methods leads to kernel functions that employ existing pharmaceutical similarity measures to treat an abstract, binary representation of molecular structure that is not used widely for SPC analysis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available