Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.696042
Title: Lexical simplification for non-native English speakers
Author: Paetzold, Gustavo Henrique
ISNI:       0000 0004 5992 2069
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Lexical Simplification is the process of replacing complex words in texts to create simpler, more easily comprehensible alternatives. It has proven very useful as an assistive tool for users who may find complex texts challenging. Those who suffer from Aphasia and Dyslexia are among the most common beneficiaries of such technology. In this thesis we focus on Lexical Simplification for English using non-native English speakers as the target audience. Even though they number in hundreds of millions, there are very few contributions that aim to address the needs of these users. Current work is unable to provide solutions for this audience due to lack of user studies, datasets and resources. Furthermore, existing work in Lexical Simplification is limited regardless of the target audience, as it tends to focus on certain steps of the simplification process and disregard others, such as the automatic detection of the words that require simplification. We introduce a series of contributions to the area of Lexical Simplification that range from user studies and resulting datasets to novel methods for all steps of the process and evaluation techniques. In order to understand the needs of non-native English speakers, we conducted three user studies with 1,000 users in total. These studies demonstrated that the number of words deemed complex by non-native speakers of English correlates with their level of English proficiency and appears to decrease with age. They also indicated that although words deemed complex tend to be much less ambiguous and less frequently found in corpora, the complexity of words also depends on the context in which they occur. Based on these findings, we propose an ensemble approach which achieves state-of-the-art performance in identifying words that challenge non-native speakers of English. Using the insight and data gathered, we created two new approaches to Lexical Simplification that address the needs of non-native English speakers: joint and pipelined. The joint approach employs resource-light neural language models to simplify words deemed complex in a single step. While its performance was unsatisfactory, it proved useful when paired with pipelined approaches. Our pipelined simplifier generates candidate replacements for complex words using new, context-aware word embedding models, filters them for grammaticality and meaning preservation using a novel unsupervised ranking approach, and finally ranks them for simplicity using a novel supervised ranker that learns a model based on the needs of non-native English speakers. In order to test these and previous approaches, we designed LEXenstein, a framework for Lexical Simplification, and compiled NNSeval, a dataset that accounts for the needs of non-native English speakers. Comparisons against hundreds of previous approaches as well as the variants we proposed showed that our pipelined approach outperforms all others. Finally, we introduce PLUMBErr, a new automatic error identification framework for Lexical Simplification. Using this framework, we assessed the type and number of errors made by our pipelined approach throughout the simplification process and found that combining our ensemble complex word identifier with our pipelined simplifier yields a system that makes up to 25% fewer mistakes compared to the previous state-of-the-art strategies during the simplification process.
Supervisor: Specia, Lucia Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.696042  DOI: Not available
Share: