Use this URL to cite or link to this record in EThOS:
Title: Crystalline cheminformatics : big data approaches to crystal engineering
Author: Adler, Philip David Felix
ISNI:       0000 0004 6348 6167
Awarding Body: University of Southampton
Current Institution: University of Southampton
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Statistical approaches to chemistry, under the umbrella of cheminformatics, are now widespread - in particular as a part of quantitative activity structure relationship and quantitative property structure relationship studies on candidate pharmaceutical studies. Using such approaches on legacy data has widely been termed “taking a big data approach”, and finds ready application in cohort medicinal studies and psychological studies. Crystallography is a field ripe for these approaches, owing in no small part to its history as a field which, by necessity, adopted digital technologies relatively early on as a part of X-ray crystallographic techniques. A discussion of the historical background of crystallography, crystallographic engineering and of the pertinent areas of cheminformatics, which includes programming, databases, file formats, and statistics is given as background to the presented research. Presented here are a series of applications of Big Data techniques within the field of crystallography. Firstly, a naıve attempt at descriptor selection was attempted using a family of sulphonamide crystal structures and glycine crystal structures. This proved to be unsuccessful owing to the very large number of available descriptors and the very small number of true glycine polymorphs used in the experiment. Secondly, an attempt to combine machine learning model building with feature selection was made using co-crystal structures obtained from the Cambridge Structural Database, using partition modelling. This method established sensible sets of descriptors which would act as strong predictors for the formation of co-crystals, however, validation of the models by using them to make predictions demonstrated the poor predictive power of the models, and let to the uncovering of a number of weaknesses therein. Thirdly, a homologous series of fluorobenzeneanilides were used as a test bed for a novel, invariant topological descriptor. The descriptor itself is based from graph theoretical techniques, and is derived from the patterns of close-contacts within the crystal structure. Fluorobenzeneanilides present an interesting case in this context, because of the historical understanding that fluorine is rarely known to be a component in a hydrogen bonding system. Regardless, the descriptor correlates with the melting point of the fluorobenzeneanilides, with one exception. The reasons for this exception are explored. In addition, a comparison of categorisations of the crystal structure using more traditional “by-eye” techniques, and groupings of compounds by shared values of the invariant descriptor were undertaken. It is demonstrated that the novel descriptor does not simply act a proxy for the arrangement of the molecules in the crystal lattice- intuitively similar structures have different values for the descriptor while very different structures can have similar values. This is evidence that the general trend of exploring intermolecular contacts in isolation from other influences over lattice formation. The correlation of the descriptor with melting point in this context suggests that the properties of crystalline material are not only products of their lattice structure. Also presented as part of all of the case studies is an illustration of some weaknesses of the methodology, and a discussion of how these difficulties can be overcome, both by individual scientists and by necessary alterations to the collective approach to recording crystallographic experiments.
Supervisor: Coles, Simon Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available