Use this URL to cite or link to this record in EThOS:
Title: Controlled image synthesis can improve chimpanzee face classification, a 'Small data' application
Author: Sandwell, Roz
ISNI:       0000 0004 5919 0101
Awarding Body: University of Bristol
Current Institution: University of Bristol
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
This thesis demonstrates that whilst more data is generally assumed to be better for training more highly performing classifiers, this is not always the case. For chimpanzee face detection and recognition, results indicate that Big Data is not necessarily better than small data; control of synthetic augmentation is required - pose is more beneficial than lighting; and synthetic data is not a substitute for real, collected data. An unstructured doubling of real training data set size from 250 images has no significant impact on detection performance. At very small training set sizes (between one and ten images) , the performance increases become smaller as more images are added. By controlling the data augmentation however, classification performance can be improved. For very small datasets of one or two images, augmentation with synthetic pose increases detection performance by up to 19.2 percentage points. On larger datasets, synthetically varied pose images can increase recognition performance on pose offset test images by up to 8.4 percentage points. The impracticality and expense of collecting data from natural habitats constrains the classification of wild animals in images to small existing datasets. Inspired by the rising tide of Big Data research, it is possible to synthetically enlarge available datasets without the need for expensive, time consuming and potentially dangerous field expeditions. Synthetic data, constructed from a custom-built structural model, introduces artificial pose and lighting variance. This data is used alongside manually-structured subsets for detection (Chapters 3- 5) and recognition (Chapter 6). For this application to chimpanzee faces, classification performance can be improved by using synthetic data augmentation, if it is a controlled increase under specific conditions. Augmenting datasets does not always lead to improvements - synthetic poses are more valuable than synthetic lighting conditions; and synthetic data is not a substitute for real data - it remains preferable to collect images containing a representation of the real object rather than generating synthetic representations.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available