Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.767238
Title: Automatic identification and translation of multiword expressions
Author: Taslimipoor, Shiva
ISNI:       0000 0004 7658 4604
Awarding Body: University of Wolverhampton
Current Institution: University of Wolverhampton
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Multiword Expressions (MWEs) belong to a class of phraseological phenomena that is ubiquitous in the study of language. They are heterogeneous lexical items consisting of more than one word and feature lexical, syntactic, semantic and pragmatic idiosyncrasies. Scholarly research on MWEs benefits both natural language processing (NLP) applications and end users. This thesis involves designing new methodologies to identify and translate MWEs. In order to deal with MWE identification, we first develop datasets of annotated verb-noun MWEs in context. We then propose a method which employs word embeddings to disambiguate between literal and idiomatic usages of the verb-noun expressions. Existence of expression types with various idiomatic and literal distributions leads us to re-examine their modelling and evaluation. We propose a type-aware train and test splitting approach to prevent models from overfitting and avoid misleading evaluation results. Identification of MWEs in context can be modelled with sequence tagging methodologies. To this end, we devise a new neural network architecture, which is a combination of convolutional neural networks and long-short term memories with an optional conditional random field layer on top. We conduct extensive evaluations on several languages demonstrating a better performance compared to the state-of-the-art systems. Experiments show that the generalisation power of the model in predicting unseen MWEs is significantly better than previous systems. In order to find translations for verb-noun MWEs, we propose a bilingual distributional similarity approach derived from a word embedding model that supports arbitrary contexts. The technique is devised to extract translation equivalents from comparable corpora which are an alternative resource to costly parallel corpora. We finally conduct a series of experiments to investigate the effects of size and quality of comparable corpora on automatic extraction of translation equivalents.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.767238  DOI: Not available
Share: