Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.789498
Title: Computational analysis of morphosyntactic categories in Georgian
Author: Daraselia, Sophiko
ISNI:       0000 0004 8501 1744
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
This thesis describes the development of part-of-speech tagging resources for the Georgian language, consisting of i.) a new morphosyntactic language model for part-of-speech (POS) tagging purposes; ii.) tagging guidelines for tagging and post-editing; iii.) the KATAG tagset and iv.) the trained parameter files the probabilistic TreeTagger program needs to work on Georgian texts. A new morphosyntactic model of Georgian for part-of-speech tagging purposes is described in the thesis. The thesis also describes a tagset (KATAG) defined in accordance with a new morphosyntactic model of the language and a set of design principles and tagging guidelines. A stochastic methodology is used here to perform tagging in Georgian. Namely, the Treetagger - a probabilistic part-of-speech tagging program has been trained on Georgian texts. The justification for this choice is discussed. I use two tokenisation approaches in part-of-speech tagging. An accuracy of 92.41% using an enclitic tokenisation approach and accuracy of 87.13% was achieved using a non-enclitic tokenisation approach, corroborating my hypothesis that treating enclitic elements separately from the host words results in better tagging performance. To make the tagger program easily adaptable for a range of inputs (type, variety or genre of text), the performance of the probabilistic TreeTagger program was evaluated according to the obtained test set consisting of five different genres such as academic, informal, legal, fiction and news.
Supervisor: Sharoff, Serge ; Nelson, Diane ; Hardie, Andrew Sponsor: International Education Center
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.789498  DOI: Not available
Share: