Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.580590
Title: Instantaneous modulation components for speaker identification
Author: Hassan, Tariq
Awarding Body: University of Southampton
Current Institution: University of Southampton
Date of Award: 2012
Availability of Full Text:
Full text unavailable from EThOS. Please contact the current institution’s library for further details.
Abstract:
In speaker recognition, we seek to find highly speaker-related features in the speech signal that can be used as the basis of effective classification. The choice of such features should follow the basic standard of speech signal characterisation, namely, to select and model the features that minimise intra-speaker variability and, at the same time, maximise inter-speaker variability. The instantaneous features of the speech signal, which are representing the approximate formant frequencies, its bandwidth, and its carrying energies at each time instant, could reflect, to some extent, the prominent materials that the speech signal could be composed of. In fact, the speech resonance and its own energies are the most dominant information that can characterise both the speech and speaker in digital system analysis. In order to investigate to what extent that the instantaneous components (instantaneous frequency, bandwidth, and energy) present in the speech signal able to hold a proper amount of speaker-dependent features that can be used as parameters for speaker identification. In this thesis three speech models have been presented in the context of text- dependent and text-independent speaker identification. The first model, frame- based AM-FM modulation model, represent a new style of the multiband demodulation analysis (MDA) adopted previously for speaker identification. This model is actually presenting the short-time analysis of the speech signal adopted in most source-filter parameterising models but without applying the discrete cosine transform (DCT) on the estimated features. The second model, is to adopt the ii instantaneous bandwidths of the speech formants frequencies and examine the level of speaker information that can be carried by such parameters. Then we suggest a kind of instantaneous frequency and bandwidth combination to represent the both formants values in one component that describe the speech resonance over each filter channel. The third scheme is to use the instantaneous amplitude features as parameters for speaker identification and to present a model that takes the advantage of both the traditional source-filter model and the AM-FM modulation model. The actual act is to reveal to what extent that both models can collaborate with each other in order to generate a new set of descriptors that can be used for speaker identification. Based on the relationship between the two models, we suggest a mixture of both the energy component of the source-filter and the modulation component of the AM-FM model. All the proposed models are evaluated in the context of text-dependent using the speech data provided by the BT-Millar (clean and noisy data), and text- independent adopting the NTIMIT corpus in a closed-set speaker identification. The performance of these models is compared with the identification results obtained using the MFCC parameters within the same framing, filtering, and the GMM classification system. The recognition results show that the new modulation parameters of both; the instantaneous frequency and bandwidth can provide better results comparing with other models depending on the status of the speech corpus and the collection of the speech parameters that represent different types of speech components.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.580590  DOI: Not available
Share: