Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.806157
Title: Methods for morphology learning in low(er)-resource scenarios
Author: Bergmanis, Toms
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
A core issue that hampers development and use of language technology for underresourced and morphologically rich languages is data sparsity. In this work, we consider unsupervised morphological analysis and lemmatization — two linguistically motivated ways to combat problems with sparse data. The morphological analysis aims to represent words in terms of the smallest meaningful units of language — morphemes (e.g., acid +ify +ed), while lemmatization concerns individual relationships among words (e.g., walks, walking and walked all are different forms of the lexeme walk). In this thesis, we focus on morphology learning in low-resource scenarios: we propose algorithms and methods that learn unsupervised morphological analysis and lemmatization with higher accuracy than the previous work while having affordable training data requirements. Our unsupervised morphological analyzers have similar or better underlying morpheme accuracy than three strong baselines while on average, inducing 12.8% more compact representation of the data than the next best system. Our lemmatizers reduce the training data requirements to raw character representations of wordforms in their immediate context, yet yield improvements (especially on unseen and ambiguous words) over systems that learn from complete morphologically annotated sentences.
Supervisor: Goldwater, Sharon ; Lopez, Adam Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.806157  DOI:
Keywords: morphological analysis ; morphemes ; morphology ; low-resource learning ; lemmatization ; Natural Language Processing
Share: