Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639230
Title: Speech variability in speaker recognition
Author: Thompson, J.
Awarding Body: University of Wales Swansea
Current Institution: Swansea University
Date of Award: 1998
Availability of Full Text:
Access from EThOS:
Abstract:
This thesis is concerned with investigating the effects of variability on the automatic speaker recognition system performance. Both speaker generated variability and variability of the recording environment are examined. Speaker generated variability (intra-variation) has received less attention than variability of the recording environment, and is therefore the main focus of this thesis. In particular, of most concern is the intra-variation of data typically found in co-operative speaker recognition tasks. That is normally spoken speech, collected over a period of months. To assess the scale of recognition errors attributed to intra-variation, errors due to noise degradation are considered first. Additive noise can rapidly degrade recognition performance, so for a more realistic assessment, a 'state of the art' noise compensation algorithm is also introduced. Comparisons between noise degradation and intra-variation, shows intra-variation to be a significant source of recognition errors, with intra-variation being the source of most recognition errors of a background noise of 9dB SNR or greater. The level of intra-variation and recognition errors is shown to be highly speaker dependent. Analysis of cepstral variation shows intra-variation to correlate more closely with recognition errors than inter-variation. Recognition experiments and analysis of the glottal pulse shape demonstrate that variation between two recording sessions generally increases as the time gap between the recording of the sessions lengthens. Glottal pulse variation is also shown to vary within recording sessions, albeit with less variation than between sessions. Glottal pulse shape variation is shown by others to vary for highly stressed speech. It is shown here to also vary for normally spoken speech collected under relatively controlled conditions. It is hypothesized that these variations occur, in part, due to the speaker's anxiety during recording. Glottal pulse variation is shown to broadly match the hypothesised anxiety profile. The gradual change of glottal pulse variation demonstrates an underlying reason why incremental speaker adaptation can be used for intra-variation compensation. Experiments show that potentially adaptation can reduce speaker identification error rates from 15% to 2.5%.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.639230  DOI: Not available
Share: