Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.586532
Title: Introducing corpus-based rules and algorithms in a rule-based machine translation system
Author: Dugast, Loic
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2013
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
Machine translation offers the challenge of automatically translating a text from one natural language into another. Statistical methods - originating from the field of information theory - have shown to be a major breakthrough in the field of machine translation. Prior to this paradigm, many systems had been developed following a rule-based approach. This denotes a system based on a linguistic description of the languages involved and of how translation occurs in the mind of the (human) translator. Statistical models on the contrary use empirical means and may work with very little linguistic hypothesis on language and translation as performed by humans. This had implications for rule-based translation systems, in terms of software architecture and the nature of the rules, which were manually input and lack any statistical feature. In the view of such diverging paradigms, we can imagine trying to combine both in a hybrid system. In the present work, we start by examining the state-of-the-art of both rule-based and statistical systems. We restrict the rule-based approach to transfer-based systems. We compare rule-based and statistical paradigms in terms of global translation quality and give a qualitative analysis of their respective specific errors. We also introduce initial black-box hybrid models that confirm there is an expected gain in combining the two approaches. Motivated by the qualitative analysis, we focus our study and experiments on lexical phrasal rules. We propose a setup allowing to extract such resources from corpora. Going one step further in the integration of rule-based and statistical approaches, we then examine how to combine the extracted rules with decoding modules that will allow for a corpus-based handling of ambiguity. This then leads to the final delivery of this work: a rule-based system for which we can learn non-deterministic rules from corpora, and whose decoder can be optimised on a tuning set in the same domain.
Supervisor: Koehn, Philipp; Osborne, Miles Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.586532  DOI: Not available
Keywords: machine translation ; rule-based ; statistical hybrid
Share: