Use this URL to cite or link to this record in EThOS:
Title: Development of a data mining tool for the identification of toxicophores
Author: Sherhod, Richard
ISNI:       0000 0004 2720 8223
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
The design of new alerts, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. Two methods have therefore been developed to help with alert identification by mining descriptions of activating structural features directly from toxicity datasets. The first method attempts to iteratively grow descriptions of activating substructures, by repeated ranking and combination of atom pairs to form increasingly detailed atom multiplets. This technique, although promising, proved too computation ally intensive for application to heterogeneous datasets. Attempts were made, however, to improve performance and to account for the eo-occurrence of structural features by interactively splitting heterogeneous datasets into subsets that support individual multiplets. The second method is based on emerging pattern mining (Dong and Li, 1999), a technique that is well known to computer science, but is relatively new to chemistry. The Horizon-Miner algorithm (Li et aI., 2001) and border-differential operation (Dong and Li, 2005) are applied to generate the minimal and maximal borders of a set of jumping-emerging patterns of atom pairs. Using the minimal jumping-emerging patterns it is possible to cluster toxic compounds into groups defined by the presence of shared structural features that occur exclusively in the actives. A method has been developed to identify hierarchical relationships between clusters and their associated jumping- emerging patterns, which has enabled families of structural feature descriptions to be arranged into trees. The root of each tree represents the most general and most commonly occurring structural feature description in the family. By inspecting clusters further down the tree, it is possible to extend the significant structural features to further distinguish sets of toxic compounds. The methodology has been tested on a number of datasets for Ames mutagenicity, oestrogenicity and hERG channel inhibition endpoints. These tests have shown the method to be effective at clustering the datasets around minimal jumping-emerging structural patterns and finding larger descriptions of the significant structural features. The resulting descriptions of the significant structural features have been shown to be related to some of the known alerts for all three tested endpoints.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available