Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.755204
Title: Neural and non-neural approaches to authorship attribution
Author: Sari, Yunita
ISNI:       0000 0004 7428 2020
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
This thesis explores a range of authorship attribution approaches and proposes new techniques to improve performance. Authorship attribution is the task of identifying the author of a text. It has attracted attention due to its relevance to a wide range of applications including forensic investigation and plagiarism detection. An array of features and approaches have been applied to this task. However, there has been a lack of study which involves multiple datasets or uses a range of different classifiers. Therefore, in this thesis we explore both neural and non-neural network models and use different feature representations on multiple datasets. We begin with a short introduction to authorship attribution in Chapter 1. A more comprehensive review of authorship attribution and its related tasks is given in Chapter 2. In Chapter 3 we introduces a novel analysis using topic modeling to examine the conditions under which each type of authorship attribution feature is useful. Chapter 4 explores the implementation of language modeling for authorship attribution. We describe the feature selection issue in standard authorship attribution approaches and evaluate whether n-gram language modeling can help to address the problem. Furthermore, we implement A Long Short Term Memory (LSTM) language model for authorship attribution and assess its effectiveness for the task. In Chapter 5 we present our work on using continuous representations for authorship attribution. In contrast to previous work, which uses discrete feature representations, our model learns continuous representations for n-gram features via a neural network jointly with the classification layer. The proposed model outperforms the state-of-the-art on two datasets, while producing comparable results on the remaining two. In addition, we describe our novel extension of the proposed models and show how the analysis in Chapter 3 helps to improve the attribution accuracy. Finally, we demonstrate how the authors' demographic profiles can help improve task performance via Multi Task Learning (MTL). In Chapter 6 we highlight the contributions of this thesis and propose directions for future research in this area.
Supervisor: Stevenson, Mark ; Vlachos, Andreas Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.755204  DOI: Not available
Share: