Characterisation of plosive, fricative and aspiration components in speech production
This thesis is a study of the production of human speech sounds by acoustic modelling and signal analysis. It concentrates on sounds that are not produced by voicing (although that may be present), namely plosives, fricatives and aspiration, which all contain noise generated by flow turbulence. It combines the application of advanced speech analysis techniques with acoustic flow-duct modelling of the vocal tract, and draws on dynamic magnetic resonance image (dMRI) data of the pharyngeal and oral cavities, to relate the sounds to physical shapes. Having superimposed vocal-tract outlines on three sagittal dMRI slices of an adult male subject, a simple description of the vocal tract suitable for acoustic modelling was derived through a sequence of transformations. The vocal-tract acoustics program VOAC, which relaxes many of the assumptions of conventional plane-wave models, incorporates the effects of net flow into a one-dimensional model (viz., flow separation, increase of entropy, and changes to resonances), as well as wall vibration and cylindrical wavefronts. It was used for synthesis by computing transfer functions from sound sources specified within the tract to the far field. Being generated by a variety of aero-acoustic mechanisms, unvoiced sounds are somewhat varied in nature. Through analysis that was informed by acoustic modelling, resonance and anti-resonance frequencies of ensemble-averaged plosive spectra were examined for the same subject, and their trajectories observed during release. The anti-resonance frequencies were used to compute the place of occlusion. In vowels and voiced fricatives, voicing obscures the aspiration and frication components. So, a method was devised to separate the voiced and unvoiced parts of a speech signal, the pitch-scaled harmonic filter (PSHF), which was tested extensively on synthetic signals. Based on a harmonic model of voicing, it outputs harmonic and anharmonic signals appropriate for subsequent analysis as time series or as power spectra. By applying the PSHF to sustained voiced fricatives, we found that, not only does voicing modulate the production of frication noise, but that the timing of pulsation cannot be explained by acoustic propagation alone. In addition to classical investigation of voiceless speech sounds, VOAC and the PSHF demonstrated their practical value in helping further to characterise plosion, frication and aspiration noise. For the future, we discuss developing VOAC within an articulatory synthesiser, investigating the observed flow-acoustic mechanism in a dynamic physical model of voiced frication, and applying the PSHF more widely in the field of speech research.