Use this URL to cite or link to this record in EThOS:
Title: Towards a purely distributional model of meaning : distance, expectation, and composition
Author: Washtell, Justin Robert
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Preamble IV Abstract ; This thesis explores the problem of inferring meaning from un-annotated language: arguably a key problem in the pursuit of strong AI. We take pains to tackle the problem from the ground up, re-examining ingrained devices such as eo-occurrence and wordspace, in search of insights into their known limitations. We pay particular attention to the pervasive problem of the poverty-of-the-stimulus, and how this can be tackled without sacrificing specificity. All of the while we adhere to a purely distributional paradigm. Our work results in three main contributions to the field: Firstly, taking a cue from statistical biogerography, we explore and develop distance-based (windowless) association measures which re-interpret the notion of eo-occurrence introduced by Harris (1954). While there has been some experimentation with distance-based devices in the past, we prove though both intrinsic analyses and psycholinguistic evaluations that they provide a particularly robust foundation for distributional analy,ses. Secondly, taking our cue from semiotics, we investigate an alternative vector space model of words-in-context which derives from the notion of reader expectation. The model combines the advantages of spaces built from high-order eo- occurrence vectors (intuitive geometric interpretations, and dense generalising vectors), with those of arbitrarily sophisticated language models (sensitivity to high- arity language structure, and the ability to exploit diverse heterogeneous feature- " sets). We test a simple implementation in a word sense disambiguation setting with very encouraging results, showing - importantly - that such models represent plausible accounts of meaning. Thirdly, we show how the resultant expectation vectors lead to an implicit compositional account of meaning. While the implementation arises trivially from the vectors, our experiments allude to some surprisingly sophisticated behaviour which indicates a sensitivity to both structural and lexical aspects of phrases. During the course of these investigations we make several subordinate contributions to the field. Among these are a formulation of distance-based predictive language models, and particularly robust vector-similarity measures based on fuzzy rough sets.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available