Use this URL to cite or link to this record in EThOS:
Title: Human translation quality estimation : feature-based and deep learning-based
Author: Yuan, Yu
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Thesis embargoed until 01 Apr 2023
Access from Institution:
This thesis studies the technical and linguistic aspects of human translation quality estimation (HTQE) for trainee translations from English to Chinese. To this end, it is cast as a supervised machine learning task through conventional feature-based learning and deep learning to predict fine-grained translation quality scores through regression, using no reference translations. I investigated how human translations (HTs) can be effectively represented at both the document-level and the sentence-level for quality estimation, exploiting feature-based and deep learning-based methods. Specifically, an extensive frame- work of translation quality features has been designed at both the sentence- and document-level, and a novel stacked neural model with a cross-lingual attention mechanism, leveraging the strengths of convolutional neural networks and recurrent neural networks, also has been proposed. From the feature-based perspective, a supervised classification method is proposed to identify terminology for quality evaluation purpose, using language- independent statistics as features. I investigated the correlation of normalised term occurrences with human annotated quality scores. Descriptive and exploratory statis- tics are carried out on trainee and machine translation datasets through pairwise correlation and principal component analysis to study the contribution of individual and group features and the distribution of translation errors, having shown that HT errors cause mainly content inadequacy and machine translation (MT) errors are more about language misuse. Fine-grained document-level and sentence-level HTQE models are trained using the state-of-the-art XGBoost algorithm with grid search parameter optimisation. Multiple models built with different feature selection strategies are compared to a strong baseline QuEst for machine translation quality estimation. On HT and MT data, the optimal models outperform the baseline and other models in predicting the majority of quality scores on the criterion of the agreement with human judgements. From the deep learning-based perspective, a stacked neural model specifically for sentence-level HTQE is presented. The neural architecture has achieved good correlations with human judgements for HTs. For the prediction of MT post-editing efforts, it has achieved comparable performance to a strong baseline for predicting HTER scores of German-English MTs and English- German machine translations (MTs) on the WMT17 test data. The model has also produced good results for predicting keystrokes. I conclude that this work has created a framework for document-level and sentence-level HTQE and has possibly started a new direction for human translation quality assessment in Translation Studies. The results on HT data show promising performance of the proposed HTQE methods in predicting fine-grained translation quality from multiple aspects.
Supervisor: Sharoff, Serge ; Babych, Bogdan Sponsor: China Scholarship Council ; University of Leeds
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available