Use this URL to cite or link to this record in EThOS:
Title: Text readability and summarisation for non-native reading comprehension
Author: Xia, Menglin
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis focuses on two important aspects of non-native reading comprehension: text readability assessment, which estimates the reading difficulty of a given text for L2 learners, and learner summarisation assessment, which evaluates the quality of learner summaries to assess their reading comprehension. We approach both tasks as supervised machine learning problems and present automated assessment systems that achieve state-of-the-art performance. We first address the task of text readability assessment for L2 learners. One of the major challenges for a data-driven approach to text readability assessment is the lack of significantly-sized level-annotated data aimed at L2 learners. We present a dataset of CEFR-graded texts tailored for L2 learners and look into a range of linguistic features affecting text readability. We compare the text readability measures for native and L2 learners and explore methods that make use of the more plentiful data aimed at native readers to help improve L2 readability assessment. We then present a summarisation task for evaluating non-native reading comprehension and demonstrate an automated summarisation assessment system aimed at evaluating the quality of learner summaries. We propose three novel machine learning approaches to assessing learner summaries. In the first approach, we examine using several NLP techniques to extract features to measure the content similarity between the reading passage and the summary. In the second approach, we calculate a similarity matrix and apply a convolutional neural network (CNN) model to assess the summary quality using the similarity matrix. In the third approach, we build an end-to-end summarisation assessment model using recurrent neural networks (RNNs). Further, we combine the three approaches to a single system using a parallel ensemble modelling technique. We show that our models outperform traditional approaches that rely on exact word match on the task and that our best model produces quality assessments close to professional examiners.
Supervisor: Briscoe, Ted Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: Natural Language Processing ; Text Readability ; Summarisation Evaluation ; L2 Reading ; Text Similarity ; Text Classification ; Automated Summarisation Assessment