Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596707
Title: Lattice rescoring methods for statistical machine translation
Author: Blackwood, G. W.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2010
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Abstract:
This thesis develops a robust inventory of large-scale lattice rescoring methods that improve the quality of statistical machine translation. These rescoring methods include (i) sentence-specific, high-order language models estimated over multi-billion word corpora, (ii) stochastic segmentation transducers that model the phrasal segmentation process in phrase-based SMT, (iii) efficient large-scale lattice minimum Bayes-risk decoding procedures based on weighted path counting transducers, (iv) multi-input and multi-source lattice combination techniques that synthesise multiple sources of translation knowledge, and (v) a novel decoding framework based on segmentation of a word lattice into regions of high and low confidence that supports targeted application of modelling techniques intended to address particular deficiencies in translation. Efficient realisations of these lattice rescoring methods are described in terms of general purpose weighted finite state transducer operations. A second theme of this thesis concerns the exploitation of monolingual corpora. Although monolingual data is much more widely available than parallel data, in SMT it is typically only used for building word-based language models. However, there are other complementary ways in which this data can be used to improve translation quality. Two novel lattice rescoring methods for exploiting monolingual corpora – phrasal segmentation models that learn the segmentation of sequences of words into sequences of translatable phrases, and monolingual coverage constraints that address the often overlooked issue of machine translation fluency – are proposed in this thesis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.596707  DOI: Not available
Share: