Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.642332
Title: Paraphrasing and translation
Author: Callison-Burch, C.
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2008
Availability of Full Text:
Full text unavailable from EThOS. Please contact the current institution’s library for further details.
Abstract:
Paraphrasing and translation have previously been treated as unconnected natural language processing tasks. We show the two are intimately related. The major contributions of this thesis are as follows: We define a novel technique for automatically generating paraphrases using bilingual parallel corpora, which are more commonly used as training data statistical models of translation. We show that paraphrases can be used to improve the quality of statistical machine translation by addressing the problem of coverage and introducing a degree of generalisation into the models. We explore the topic of automatic evaluation of translation quality, and show that the current standard evaluation methodology cannot be guaranteed to correlate with human judgements of translation quality. Whereas previous data-driven approaches to paraphrasing were dependent upon either data sources which were uncommon such as multiple translation of the same source text, or language specific resources such as parsers, our approach is able to harness more widely parallel corpora and can be applied to any language which has a parallel corpus. Being a language independent and probabilistic approach allows our method to be easily integrated into statistical machine translation. Paraphrasing can be used to increase coverage by adding translations of previously unseen source words and phrases. Results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.642332  DOI: Not available
Share: