Use this URL to cite or link to this record in EThOS:
Title: Interpretation and mining of statistical machine learning (Q)SAR models for toxicity prediction
Author: Webb, Samuel J.
ISNI:       0000 0004 5359 4025
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Structure Activity Relationship (SAR) modelling capitalises on techniques developed within the computer science community, particularly in the fields of machine learning and data mining. These machine learning approaches are often developed for the optimisation of model accuracy which can come at the expense of the interpretation of the prediction. Highly predictive models should be the goal of any modeller, however, the intended users of the model and all factors relating to usage of the model should be considered. One such aspect is the clarity, understanding and explanation for the prediction. In some cases black box models which do not provide an interpretation can be disregarded regardless of their predictive accuracy. In this thesis the problem of model interpretation has been tackled in the context of models to predict toxicity of drug like molecules. Firstly a novel algorithm has been developed for the interpretation of binary classification models where the endpoint meets defined criteria: activity is caused by the presence of a feature and inactivity by the lack of an activating feature or the deactivation of all such activating features. This algorithm has been shown to provide a meaningful interpretation of the model’s cause(s) of both active and inactive predictions for two toxicological endpoints: mutagenicity and skin irritation. The algorithm shows benefits over other interpretation algorithms in its ability to not only identify the causes of activity mapped to fragments and physicochemical descriptors but also in its ability to account for combinatorial effects of the descriptors. The interpretation is presented to the user in the form of the impact of features and can be visualised as a concise summary or in a hierarchical network detailing the full elucidation of the models behaviour for a particular query compound. The interpretation output has been capitalised on and incorporated into a knowledge mining strategy. The knowledge mining is able to extract the learned structure activity relationship trends from a model such as a Random Forest, decision tree, k Nearest Neighbour or support vector machine. These trends can be presented to the user focused around the feature responsible for the assessment such as ACTIVATING or DEACTIVATING. Supporting examples are provided along with an estimation of the models predictive performance for a given SAR trend. Both the interpretation and knowledge mining has been applied to models built for the prediction of Ames mutagenicity and skin irritation. The performance of the developed models is strong and comparable to both academic and commercial predictors for these two toxicological activities.
Supervisor: Krause, Paul J.; Howlin, Brendan Sponsor: Lhasa Limited ; Technology Strategy Board
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available