Title:
|
Analysis-by-synthesis coding of narrowband and wideband speech at medium bit rates
|
The last few years has seen a rapid expansion in the development of efficient speech compression algorithms which has been primarily fuelled by the proliferation of digital mobile communication systems. Low bit rate speech coding algorithms estimate, quantise and efficiently encode the parameters of a speech production model by using the original speech waveform. The most popular of these models is based on the technique of Linear Prediction which has resulted in a class of speech coding algorithms known as Analysis-by-Synthesis Linear Prediction Coding (AbS-LPC). In the AbS-LPC coding system, a closed loop optimisation procedure is used to determine the excitation signal for the Linear Prediction filter. This methodology of speech coding has been the foundation of many algorithms operating at medium to low bit rates. In particular, the Codebook Excited Linear Prediction (CELP) algorithm has received much attention in the past few years which has culminated in numerous standards being based on this principle. CELP achieves its coding efficiency and high quality by representing the excitation signal as a vector. However, in the original implementation of this algorithm the excitation search was very computationally intensive due to the structure of the codebook. In order to reduce this computational complexity and improve the quality of the synthetic speech this thesis explores various structures of secondary excitations which are based on sparsely populated pulsed vectors. A variable rate implementation of the CELP algorithm is also presented where techniques typically found in vocoders are used to provide an accurate classification of the different types of speech. These metrics are then used to vary the speech segment size and coding rate to take advantage of the differing regions of speech. Narrowband speech is defined to be band limited between 300 Hz - 3.4 kHz and is sampled at the Nyquist sampling rate of 8 kHz. However, wideband speech lies between 50 Hz and 7 kHz and is consequently sampled at a higher rate of 16 kHz. Wideband speech exhibits characteristics which are not normally embodied within the narrowband signal. It is these characteristics which contribute to the superior perceived quality and therefore it is imperative that a coding scheme maintains this information. This thesis formulates various strategies for the coding of wideband speech using the CELP coding structure. Particular attention is paid to preserving the information in the higher frequencies so that the overall quality is maintained in the synthetic signal. A low delay variant of the wideband coder is also presented where particular attention is paid to the effects of backward LPC prediction over the full bandwidth of the signal are investigated. This results in a split band architecture which is capable of producing high quality wideband speech.
|