Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.572381
Title: Minimally supervised techniques for bilingual lexicon extraction
Author: Ismail, Azniah Binti
Awarding Body: University of York
Current Institution: University of York
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Normally, word translations are extracted from non-parallel, bilingual corpora, and initial bilingual lexicon, i.e., a list of known translations, is typically used to aid the learning process. This thesis highlights the study of a series of novel techniques that utilized scarce resources. To make the study even more challenging, only minimal use of resources was allowed and important major linguistic tools were not employed. Thus, this study introduces some novel techniques for learning a translation lexicon based on a minimally-supervised, context-based approach. The performance of each technique was measured by comparing the extracted lexicon to a reference lexicon based on the F1 score, which is a weighted average of the precision and the recall. The scores may range from 0 (worst) to 100% (best). Analysis performed on the proposed techniques showed that these techniques had recorded promising F1 scores, ranging from 57.1% to 80.9%, which indicate moderate and best performances. Overall, the findings of this study further reinforce the use of techniques in exploiting words from small corpora, suggesting that words that are contextually-relevant and occurring in a similar domain are potentially useful. This thesis also presents a technique to deploy extra (i.e., additional) data, which are harvested from the web, and a novel method for measuring similarity of features between two words of different languages without involving the use of initial bilingual lexicon.
Supervisor: Manandhar, Suresh Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.572381  DOI: Not available
Share: