Use this URL to cite or link to this record in EThOS:
Title: Collocation extraction : a generic substitution-based approach
Author: Pearce, Darren Michael
ISNI:       0000 0004 2670 6730
Current Institution: University of Sussex
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
One of the fundamental aspects of any natural language is the set of words used within it. In addition to knowing how individual words can be combined to communicate meaning, competent language users also know a large number of specific word combinations whose grammatical or distributional behaviour or meaning is idiosyncratic. This research is concerned with computational aspects of one important type of word combination: collocation. There is no agreed formal definition of collocation but it can be informally characterised as a sequence of words that occurs more often than would be expected by chance and whose combination tends to produce an element of added meaning. One of the often-cited characteristics of collocations is that they restrict substitution for their constituent words. This thesis develops a generic framework for the extraction of collocations that exploits this restriction. Experiments exploring the performance of such techniques use frequency counts derived from the WWW as well as large amounts of analysed text from conventional corpora and show that substitution-based techniques can out-perform many existing approaches to collocation extraction. The thesis concludes with a discussion of the many ways in which further research can leverage the genericity of the framework and utilise substitution for collocation extraction.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available