Use this URL to cite or link to this record in EThOS:
Title: Multilevel models for the analysis of linguistic data
Author: Alexander, Craig
Awarding Body: University of Glasgow
Current Institution: University of Glasgow
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Describing the numerous factors that constrain and promote particular aspects of linguistic behaviour in interaction is very difficult. The recent adoption of more advanced quantitative methods has enhanced this modelling, leading to a greater understanding of linguistic patterns. At the same time, the increase in availability of digital recordings and storage capacity for such recordings is leading to increasingly large corpora of complex linguistic data for such investigations. The Sounds of the City corpus is one such example and is the corpus we model throughout this thesis. The corpus is an electronic real-time corpus of Glaswegian vernacular, which consists of a searchable, multi layered database of 58 hours of recordings from 136 speakers, recorded between 1970 and 2010 with orthographic transcripts and automatically phonemically segmented waveforms, amenable to automatic acoustic analyses of durational and resonance characteristics of speech. Vowel formant measurements provide a numeric representation of a spoken vowel and are a commonly used metric to measure linguistic variation and change, with each vowel having multiple formant measures, which correspond to the resonances of the vocal tract. The first three vowel formants are important perceptual cues for the successful recognition of vowel qualities. Current quantitative modelling methods consider each formant separately, inferring characteristics on each formant measurement assuming independence between each formant. This assumption for most vowels seems misplaced, as formant measures are often correlated with one another. In this thesis, we extend upon current modelling techniques applied to sociolinguistic corpora by introducing a Bayesian hierarchical model which models the first three formant measures for each vowel simultaneously, taking into consideration the correlation present between such measures. We also implement reparameterisation methods to alleviate issues caused by highly correlated samples, which is often observed in MCMC output for models applied to datasets with nested structures, a common feature in sociolinguistic corpora. These models not only account for the complex nested structure of the data and uncover the underlying dynamics of language just like classical mixed effects models, but now additionally account for the correlation between formants, providing a more accurate representation of factors driving linguistic variation and change. The output from the Bayesian hierarchical model is visualised as a graphical model. Graphical models provide a visual representation of the conditional dependence between variables, making them an attractive inference tool. We combine the hierarchical model and jointly infer the relationship between vowel formant measurements using the precision estimates from the hierarchical model as input to a Bayesian Gaussian graphical model. The resulting graph utilises a chain graph like structure which visually informs the user which factors have a significant effect on vowel variation, corresponding to each formant, and also the relationship present between the first three formants. This novel inference tool helps to aid the understanding of complex model output much like the ones fitted to the Sounds of the City corpus, though can easily be applied to numerous modelling problems.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: PE English ; Q Science (General) ; QA Mathematics