Use this URL to cite or link to this record in EThOS:
Title: Automatic compilation of bilingual terminologies from comparable corpora
Author: Kontonatsios, Georgios Nikolaos
ISNI:       0000 0004 5353 0742
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Bilingual terminological resources play a pivotal role in human and machine translation of technical text. Owing to the immense volume of newly produced terminology in the biomedical domain, existing resources suffer from low coverage and they are only available for a limited number of languages. The need for term alignment methods that accurately identify translations of terms, emerges. In this work, we focus on bilingual terminology induction from freely available comparable corpora, i.e. thematically related documents in two or more languages. We investigate different sources of information that determine translation equivalence, including: (a) the internal structure of terms (compositional clue), (b) the surrounding lexical context (contextual clue) and (c) the topic distribution of terms (topical clue). We present four novel compositional alignment methods and we introduce several extensions over existing compositional, context-based and topic-based approaches. Furthermore, we combine the three translation clues in a single term alignment model and we show substantial improvements over the individual translation signals when considered in isolation. We examine the performance of the proposed term alignment methods on closely related (English-French, English-Spanish) language pairs, on a more distant, low-resource language pair (English-Greek) and on an unrelated (English-Japanese) language pair. As an application, we integrate automatically compiled bilingual terminologies with Statistical Machine Translation systems to more accurately translate unknown terms. Results show that an up-to-date bilingual dictionary of terms improves the translation performance of SMT.
Supervisor: Not available Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Term Alignment ; Comparable Corpora ; Statistical Machine Translation ; UIMA