Use this URL to cite or link to this record in EThOS:
Title: Voice biometric system security : design and analysis of countermeasures for replay attacks
Author: Chettri, Bhusan
ISNI:       0000 0004 9355 6082
Awarding Body: Queen Mary University of London
Current Institution: Queen Mary, University of London
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Access from Institution:
Voice biometric systems use automatic speaker verification (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoofing attacks (or presentation attacks) is of growing concern and is now well acknowledged by the research community. A spoofing attack involves illegitimate access to personal data of a targeted user. Replay is among the simplest attacks to mount - yet difficult to detect reliably - and is the focus of this thesis. This research focuses on the analysis and design of existing and novel countermeasures for replay attack detection in ASV, organised in two major parts. The first part of the thesis investigates existing methods for spoofing detection from several perspectives. I first study the generalisability of hand-crafted features for replay detection that show promising results on synthetic speech detection. I find, however, that it is difficult to achieve similar levels of performance due to the acoustically different problem under investigation. In addition, I show how class-dependent cues in a benchmark dataset (ASVspoof 2017) can lead to the manipulation of class predictions. I then analyse the performance of several countermeasure models under varied replay attack conditions. I find that it is difficult to account for the effects of various factors in a replay attack: acoustic environment, playback device and recording device, and their interactions. Subsequently, I developed and studied a convolutional neural network (CNN) model that demonstrates comparable performance to the one that ranked first in the ASVspoof 2017 challenge. Here, the experiment analyses what the CNN has learned for replay detection using a method from interpretable machine learning. The findings suggest that the model highly attends at the first few milliseconds of test recordings in order to make predictions. Then, I perform an in-depth analysis of a benchmark dataset (ASVspoof 2017) for spoofing detection and demonstrate that any machine learning countermeasure model can still exploit the artefacts I identified in this dataset. The second part of the thesis studies the design of countermeasures for ASV, focusing on model robustness and avoiding dataset biases. First, I proposed an ensemble model combining shallow and deep machine learning methods for spoofing detection, and then demonstrate its effectiveness on the latest benchmark datasets (ASVspoof 2019). Next, I proposed the use of speech endpoint detection for reliable and robust model predictions on the ASVspoof 2017 dataset. For this, I created a publicly available collection of hand-annotations of speech endpoints for the same dataset, and new benchmark results for both frame-based and utterance-based countermeasures are also developed. I then proposed spectral subband modelling using CNNs for replay detection. My results indicate that models that learn subband-specific information substantially outperform models trained on complete spectrograms. Finally, I proposed to use variational autoencoders - deep unsupervised generative models - as an alternative backend for spoofing detection and demonstrate encouraging results when compared with the traditional Gaussian mixture model.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available