Use this URL to cite or link to this record in EThOS:
Title: Discovering culturomic trends in large-scale textual corpora
Author: Lansdall-Welfare, Thomas
ISNI:       0000 0004 5924 2290
Awarding Body: University of Bristol
Current Institution: University of Bristol
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
The abundance of data and the ability to process it at a massive scale has transformed many areas of research in the natural sciences. These data-driven methods have recently begun to be adopted in other fields of research which traditionally have not relied on computational approaches, such as the social sciences and humanities. As we continue forward, we will likely see an increase in the spread of data-driven approaches in these fields as more and more data is "born digital", coupled with mass digitalisation projects that aim to digitise the mountains of paper archives that still exist. In this thesis, we look at extracting, analysing and delving into data from massive textual corpora, concentrating on macroscopic trends and characteristics that can only be found when transitioning from traditional social science methods involving manual inspection known as 'coding' to scalable, data-driven computational methods. A distributed architecture for large-scale text analysis was collaboratively developed during the project, serving as the infrastructure for collecting, storing and analysing data. Using this infrastructure, this thesis not only explores methods for extracting information in a scalable way but also demonstrates the types of studies that can be achieved by adopting data-driven approaches. These studies and their findings include differences in writing style across topics and news outlets; longitudinal and diurnal pat ferns of mood change in population-scale samples of UK social media users; and general tools and methods that can be used to interrogate and explore massive textual corpora in an interactive way. We conclude that data-driven methods for the analysis of large-scale textual corpora have now reached a point where the extraction of macroscopic trends and patterns can enable meaningful information about the real-world to be discovered.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available