Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.692628
Title: Semantic sentence similarity incorporating linguistic concepts
Author: Pearce, David Matthew
ISNI:       0000 0004 5919 3177
Awarding Body: Manchester Metropolitan University
Current Institution: Manchester Metropolitan University
Date of Award: 2015
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
A natural language allows a set of simpler ideas to be combined together to communicate much more complex ideas. This ability gives language the potential for use as a highly intuitive method of human interaction. However, this freedom of expression makes interpreting language with automation extremely challenging. Semantic sentence similarity is an approach which allows the knowledge of how to compare simpler units, such as words, to obtain a measure of similarity between two sentences. This similarity can allow existing knowledge to be applied to new situations. The objective of this research is to show that a sentence similarity model can be improved through the inclusion of Linguistic concepts, with the aim of producing a more accurate model. This presents the challenge of adapting the human focused rules of Linguistics for sentence similarity and how to evaluate individual component effects in isolation. This research successfully overcame these barriers through the development of an extensible modular framework and construction of a new mathematical model for this framework , called SARUMAN. The core contribution of the research resulted from gradually incorporating fundamental Linguistic components to SARUMAN including: disambiguation by part of speech; treating the sentence as clauses, and advanced word interaction to handle where meanings merge. The most advanced being called SCAWIT. From experiments on a small data set, each of these introduced concepts showed statistically significant improvement in the Pearson's correlation (0.05 or more) over the previous version. The produced models were capable of processing several hundred sentence pairs a second with a single processor. A further significant advance to the field of sentence similarity was the introduction of opposites to sentence similarity. This was conceptually beyond the pre-existing models and showed strong results for an extension of SCAWIT, called SANO. Other novel contribution was added through automated word sense disambiguation from WordNet definitions; and the use of a properties of words model. Some of these changes have potential but did not yield significant improvement with the current knowledge base.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.692628  DOI: Not available
Share: