Use this URL to cite or link to this record in EThOS:
Title: Bayesian models of syntactic category acquisition
Author: Frank, Stella Christina
ISNI:       0000 0004 2736 1420
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Discovering a word’s part of speech is an essential step in acquiring the grammar of a language. In this thesis we examine a variety of computational Bayesian models that use linguistic input available to children, in the form of transcribed child directed speech, to learn part of speech categories. Part of speech categories are characterised by contextual (distributional/syntactic) and word-internal (morphological) similarity. In this thesis, we assume language learners will be aware of these types of cues, and investigate exactly how they can make use of them. Firstly, we enrich the context of a standard model (the Bayesian Hidden Markov Model) by adding sentence type to the wider distributional context.We show that children are exposed to a much more diverse set of sentence types than evident in standard corpora used for NLP tasks, and previous work suggests that they are aware of the differences between sentence type as signalled by prosody and pragmatics. Sentence type affects local context distributions, and as such can be informative when relying on local context for categorisation. Adding sentence types to the model improves performance, depending on how it is integrated into our models. We discuss how to incorporate novel features into the model structure we use in a flexible manner, and present a second model type that learns to use sentence type as a distinguishing cue only when it is informative. Secondly, we add a model of morphological segmentation to the part of speech categorisation model, in order to model joint learning of syntactic categories and morphology. These two tasks are closely linked: categorising words into syntactic categories is aided by morphological information, and finding morphological patterns in words is aided by knowing the syntactic categories of those words. In our joint model, we find improved performance vis-a-vis single-task baselines, but the nature of the improvement depends on the morphological typology of the language being modelled. This is the first token-based joint model of unsupervised morphology and part of speech category learning of which we are aware.
Supervisor: Goldwater, Sharon; Keller, Frank Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Bayesian modelling ; language acquisition ; cognitive modelling ; syntax acquisition