Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.496999
Title: Unsupervised grammar induction with simple linguistic constraints
Author: Li, Chi-Ho
Awarding Body: UNIVERSITY OF SUSSEX
Current Institution: University of Sussex
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
Abstract:
This thesis investigates the problem of unsupervised learning of natural language grammar in the context-free grammar formalism, and argues that linguistic notions are beneficial to the task. Like some recent approaches, this thesis employs distributional clustering, which is based on the linguistic notion of distribution. Although grammar induction is conceptually complicated as it involves both the demarcation between constituents and non-constituents and that between different types of constituents, it is shown in the thesis that these two tasks are actually the two sides of the same coin. That is, nonconstituents can also be classified into different clusters and these clusters are very easy to be separated from those of constituents, and therefore the real problem in grammar induction is how to identify constituents. This thesis provides a generic framework of distributional grammar induction for experimenting with the effect of different criteria for selecting clusters of constituents. Experiments show that a criterion based on the simple principle of minimum variance fails to learn plausible grammars from vast amount of complex data, and it also leads to inconsistency in syntactic analysis as well as flat parse trees. Another criterion is proposed on the basis of the fragment test, one of the constituency tests proposed in distributional linguistics. This criterion, augmented by a novel grammar rule rewriting mechanism, is shown to be successful in guarding against many frequently-occurred non-constituents, in learning very many types of constituents, and in removing redundancy in grammar and giving rise to highly hierarchical syntactic structure.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.496999  DOI: Not available
Share: