Use this URL to cite or link to this record in EThOS:
Title: The application of machine learning, big data techniques, and criminology to the analysis of racist tweets
Author: Day, Ed
ISNI:       0000 0004 7968 6489
Awarding Body: Canterbury Christ Church University
Current Institution: Canterbury Christ Church University
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Racist tweets are ubiquitous on Twitter. This thesis aims to explore the creation of an automated system to identify tweets and tweeters, and at the same time gain a theoretical understanding of the tweets. To do this a mixed methods approach was employed: machine learning was utilised to identify racist tweets and tweeters, and grounded theory and other qualitative techniques were used to gain an understanding of the tweets' content. 84 million tweets that all contained racist words were collected from Twitter. 84,000 of these were hand annotated as racist or not. The machine learning was performed in a Hadoop cluster, utilising Spark and Hive. To identify racist tweets, systematic comparison of seven different algorithms, and a large number of textual, user derived and geographical features was performed. New features: time of day and day of week were also evaluated. The 84,000 hand annotated tweets were used as input to the machine learning supervised classification processes. It was found that the combination of support vector machines with hour of day as additional feature was optimal for accuracy (0.93) and AUPRC (0.86). A qualitative exploration of tweets was also performed, including a grounded theory analysis. A novel machine learning system to identify racist accounts was created using metrics from the racist tweets, concepts from the grounded theory and a combination of the two as feature inputs. All three sets of features gave accuracy of at least 0.82. The ambiguity of the tweets meant they were difficult to classify, for both humans and machines, as to whether the tweeter's intentions were racist or not, the word 'nigga' being particularly problematic. Grounded theory analysis of the tweets showed extremely narrow rhetoric that could be summarised in a single theoretical concept: the defence of the in-group.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: HT Communities. Classes. Races ; HV6001 Criminology