Use this URL to cite or link to this record in EThOS:
Title: Graph-based approaches to word sense induction
Author: Hope, David Richard
ISNI:       0000 0004 5349 2264
Awarding Body: University of Sussex
Current Institution: University of Sussex
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis is a study of Word Sense Induction (WSI), the Natural Language Processing (NLP) task of automatically discovering word meanings from text. WSI is an open problem in NLP whose solution would be of considerable benefit to many other NLP tasks. It has, however, has been studied by relatively few NLP researchers and often in set ways. Scope therefore exists to apply novel methods to the problem, methods that may improve upon those previously applied. This thesis applies a graph-theoretic approach to WSI. In this approach, word senses are identifed by finding particular types of subgraphs in word co-occurrence graphs. A number of original methods for constructing, analysing, and partitioning graphs are introduced, with these methods then incorporated into graphbased WSI systems. These systems are then shown, in a variety of evaluation scenarios, to return results that are comparable to those of the current best performing WSI systems. The main contributions of the thesis are a novel parameter-free soft clustering algorithm that runs in time linear in the number of edges in the input graph, and novel generalisations of the clustering coeficient (a measure of vertex cohesion in graphs) to the weighted case. Further contributions of the thesis include: a review of graph-based WSI systems that have been proposed in the literature; analysis of the methodologies applied in these systems; analysis of the metrics used to evaluate WSI systems, and empirical evidence to verify the usefulness of each novel method introduced in the thesis for inducing word senses.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: P0098 Computational linguistics. Natural language processing