Use this URL to cite or link to this record in EThOS:
Title: A machine learning approach to the identification of translational language : an inquiry into translationese learning models
Author: Ilisei, Iustina-Narcisa
ISNI:       0000 0004 2739 2091
Awarding Body: University of Wolverhampton
Current Institution: University of Wolverhampton
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Access from Institution:
In the world of Descriptive Translation Studies, translationese refers to the specific traits that characterise the language used in translations. While translationese has been often investigated to illustrate that translational language is different from non-translational language, scholars have also proposed a set of hypotheses which may characterise such di erences. In the quest for the validation of these hypotheses, embracing corpus-based techniques had a well-known impact in the domain, leading to several advances in the past twenty years. Despite extensive research, however, there are no universally recognised characteristics of translational language, nor universally recognised patterns likely to occur within translational language. This thesis addresses these issues, with a less used approach in the eld of Descriptive Translation Studies, by investigating the nature of translational language from a machine learning perspective. While the main focus is on analysing translationese, this thesis investigates two related sub-hypotheses: simplication and explicitation. To this end, a multilingual learning framework is designed and implemented for the identification of translational language. The framework is modelled as a categorisation task, the learning techniques having the major goal to automatically learn to distinguish between translated and non-translated texts. The second and third major goals of this research are the retrieval of the recurring patterns that are revealed in the process of solving the task of categorisation, as well as the ranking of the most in uential characteristics used to accomplish the learning task. These aims are ful lled by implementing a system that adopts the machine learning methodology proposed in this research. The learning framework proves to be an adaptable multilingual framework for the investigation of the nature of translational language, its adaptability being illustrated in this thesis by applying it to the investigation of two languages: Spanish and Romanian. In this thesis, di erent research scenarios and learning models are experimented with in order to assess to what extent translated texts can be diff erentiated from non-translated texts in certain contexts. The findings show that machine learning algorithms, aggregating a large set of potentially discriminative characteristics for translational language, are able to diff erentiate translated texts from non-translated ones with high scores. The evaluation experiments report performance values such as accuracy, precision, recall, and F-measure on two datasets. The present research is situated at the con uence of three areas, more precisely: Descriptive Translation Studies, Machine Learning and Natural Language Processing, justifying the need to combine these elds for the investigation of translationese and translational hypotheses.
Supervisor: Mitkov, R.; Corpas, G.; Inkpen, D. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Translationese, Simplification Universal, Explicitation Universal, Translation Universals, Third Code ; nature of translated language, Machine Learning, Multilingual Methodology, Translation Studies ; Translation Theory, Natural Language Processing, Computational Linguistics, Romanian Translational Corpus ; Corpora, Spanish, Romanian, translations, texts, text