Use this URL to cite or link to this record in EThOS:
Title: Class-based statistical models for lexical knowledge acquisition.
Author: Clark, Stephen.
ISNI:       0000 0001 3556 818X
Awarding Body: University of Sussex
Current Institution: University of Sussex
Date of Award: 2001
Availability of Full Text:
Access from EThOS:
This thesis is about the automatic acquisition of a particular kind of lexical knowledge, namely the knowledge of which noun senses can fill the argument slots of predicates. The knowledge is represented using probabilities, which agrees with the intuition that there are no absolute constraints on the arguments of predicates, but that the constraints are satisfied to a certain degree; thus the problem of knowledge acquisition becomes the problem of probability estimation from corpus data. The problem with defining a probability model in terms of senses is that this involves a huge number of parameters, which results in a sparse data problem. The proposal here is to define a probability model over senses in a semantic hierarchy, and exploit the fact that senses can be grouped into classes consisting of semantically similar senses. A novel class-based estimation technique is developed, together with a procedure that determines a suitable class for a sense (given a predicate and argument position). The problem of determining a suitable class can be thought of as finding a suitable level of generalisation in the hierarchy. The generalisation procedure uses a statistical test to locate areas consisting of semantically similar senses, and, as well as being used for probability estimation, is also employed as part of a re-estimation algorithm for estimating sense frequencies from incomplete data. The rest of the thesis considers how the lexical knowledge can be used to resolve structural ambiguities, and provides empirical evaluations. The estimation techniques are first integrated into a parse selection system, using a probabilistic dependency model to rank the alternative parses for a sentence. Then, a PP-attachment task is used to provide an evaluation which is more focussed on the class-based estimation technique, and, finally, a pseudo disambiguation task is used to compare the estimation technique with alternative approaches.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computational linguistics