Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.758209
Title: Robust speaker recognition in reverberant condition : toward greater biometric security
Author: Al-Karawi, K. A.
ISNI:       0000 0004 7430 9844
Awarding Body: University of Salford
Current Institution: University of Salford
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Automatic speaker recognition systems have developed into an increasingly relevant technology for security applications in modern times. The primary challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean, high-quality speech with the matching of training and test acoustic conditions. However, under mismatched conditions and reverberant environments, often expected in the real world, system performance degrades significantly.“ The main aim of this study is to improve the robustness of speaker recognition systems for real-world applications in reverberant conditions by developing methods that can reduce the detrimental effects of reverberation on the single microphone speech signal”. The collection of suitable speech data sets is of crucial importance for testing the performance in the development of speaker recognition techniques. Therefore, a data set of anechoic speech recordings was generated and used to conduct the study regarding the suggested methods in this thesis. Furthermore, a typical speaker recognition system was implemented and then evaluated based on the current state of the art technique using Gaussian Mixture Models with two standard features. The effect of “reverberation time” and the “distance from the source to a receiver” on the system performance have also been examined, and the result confirms that whilst both parameters could affect the system accuracy. A “maximum likelihood algorithm” is used for blind-estimate reverberation time from speech signals submitted for verification. The estimated values are used to choose a matched acoustic impulse response for inclusion in the retraining or fine-tuning of the pattern recognition model. To endeavour more improvement, the “autocorrelation function” has been used to estimate the early reflections sound value for the submitted signal. The estimated early reflections sound value has convolved with the anechoic signal, and then used for training the pattern recognition model. Furthermore, both of the early to late ratio and RT have identified for the submitted sample and practically used to determine a matched channel for the training on the fly to improve the system performance. The principal findings are that “reverberation time”, “early reflections” and “early to late ratio” can be estimated and then used with “training on the fly methods” to improve the speaker verification performance. The system is an improvement, which is demonstrated by comparing the performance of speaker recognition using “conventional methods” with the performance of the proposed “re-training method”.
Supervisor: Not available Sponsor: Ministry of Higher Education ; Iraq
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.758209  DOI: Not available
Share: