Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.551151
Title: Robust speaker identification against computer aided voice impersonation
Author: Haider, Zargham
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Speaker Identification (SID) systems offer good performance in the case of noise free speech and most of the on-going research aims at improving their reliability in noisy environments. In ideal operating conditions very low identification error rates can be achieved. The low error rates suggest that SID systems can be used in real-life applications as an extra layer of security along with existing secure layers. They can, for instance, be used alongside a Personal Identification Number (PIN) or passwords. SID systems can also be used by law enforcements agencies as a detection system to track wanted people over voice communications networks. In this thesis, the performance of 'the existing SID systems against impersonation attacks is analysed and strategies to counteract them are discussed. A voice impersonation system is developed using Gaussian Mixture Modelling (GMM) utilizing Line Spectral Frequencies (LSF) as the features representing the spectral parameters of the source-target pair. Voice conversion systems based on probabilistic approaches suffer from the problem of over smoothing of the converted spectrum. A hybrid scheme using Linear Multivariate Regression and GMM, together with posterior probability smoothing is proposed to reduce over smoothing and alleviate the discontinuities in the converted speech. The converted voices are used to intrude a closed-set SID system in the scenarios of identity disguise and targeted speaker impersonation. The results of the intrusion suggest that in their present form the SID systems are vulnerable to deliberate voice conversion attacks. For impostors to transform their voices, a large volume of speech data is required, which may not be easily accessible. In the context of improving the performance of SID against deliberate impersonation attacks, the use of multiple classifiers is explored. Linear Prediction (LP) residual of the speech signal is also analysed for speaker-specific excitation information. A speaker identification system based on multiple classifier system, using features to describe the vocal tract and the LP residual is targeted by the impersonation system. The identification results provide an improvement in rejecting impostor claims when presented with converted voices. It is hoped that the findings in this thesis, can lead to the development of speaker identification systems which are better equipped to deal with the problem with deliberate voice impersonation.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.551151  DOI: Not available
Share: