Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.651039
Title: Semi-continuous hidden Markov models for automatic speaker verification
Author: Forsyth, Mark Eric
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 1995
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
This thesis investigates the use of semi-continuous hidden Markov models (HMM) for automatic speaker verification (ASV) over a telephone channel. The system which was implemented is evaluated on a large database of isolated digits recorded over the British telephone network. The goal of the work is to improve performance of the ASV system under the constraints of limited enrolment data (5 tokens of each digit) and realistic computational and storage requirements. Experiments are conducted on the combined use of several standard feature sets under a common state segmentation, multiple codebook architecture. The feature sets investigated are linear predictive cepstral coefficients, mel-frequency cepstral coefficients and their respective first order differences. New algorithms which are proposed and evaluated include the weighting of digits scores according to their usefulness to the verification task and using Gaussian state duration probabilities as an additional information source in the verification decision. The most important contribution of this thesis is the development of a method for the construction of discriminating HMMs without the need for discriminative training. This new form of model, known as a discriminating observation probability (DOP) HMM involves the combination of standard HMMs to form a discriminating model. The DOP models are more flexible and perform better than the speaker normalisation techniques which are currently favoured in the literature. DOP models have potential application to many binary classification tasks using HMMs. The equal error rate (EER) using speaker specific thresholds on a series of 12 isolated digits was 0.17% using multiple codebook DOP models, compared to 1.93% using single codebook conventional HMM models. This represents a reduction in EER of 91%.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.651039  DOI: Not available
Share: