Speech quality prediction for voice over Internet protocol networks
IP networks are on a steep slope of innovation that will make them the long-term carrier of all types of traffic, including voice. However, such networks are not designed to support real-time voice communication because their variable characteristics (e.g. due to delay, delay variation and packet loss) lead to a deterioration in voice quality. A major challenge in such networks is how to measure or predict voice quality accurately and efficiently for QoS monitoring and/or control purposes to ensure that technical and commercial requirements are met. Voice quality can be measured using either subjective or objective methods. Subjective measurement (e.g. MOS) is the benchmark for objective methods, but it is slow, time consuming and expensive. Objective measurement can be intrusive or non-intrusive. Intrusive methods (e.g. ITU PESQ) are more accurate, but normally are unsuitable for monitoring live traffic because of the need for a reference data and to utilise the network. This makes non-intrusive methods(e.g. ITU E-model) more attractive for monitoring voice quality from IP network impairments. However, current non-intrusive methods rely on subjective tests to derive model parameters and as a result are limited and do not meet new and emerging applications. The main goal of the project is to develop novel and efficient models for non-intrusive speech quality prediction to overcome the disadvantages of current subjective-based methods and to demonstrate their usefulness in new and emerging VoIP applications. The main contributions of the thesis are fourfold: (1) a detailed understanding of the relationships between voice quality, IP network impairments (e.g. packet loss, jitter and delay) and relevant parameters associated with speech (e.g. codec type, gender and language) is provided. An understanding of the perceptual effects of these key parameters on voice quality is important as it provides a basis for the development of non-intrusive voice quality prediction models. A fundamental investigation of the impact of the parameters on perceived voice quality was carried out using the latest ITU algorithm for perceptual evaluation of speech quality, PESQ, and by exploiting the ITU E-model to obtain an objective measure of voice quality. (2) a new methodology to predict voice quality non-intrusively was developed. The method exploits the intrusive algorithm, PESQ, and a combined PESQ/E-model structure to provide a perceptually accurate prediction of both listening and conversational voice quality non-intrusively. This avoids time-consuming subjective tests and so removes one of the major obstacles in the development of models for voice quality prediction. The method is generic and as such has wide applicability in multimedia applications. Efficient regression-based models and robust artificial neural network-based learning models were developed for predicting voice quality non-intrusively for VoIP applications. (3) three applications of the new models were investigated: voice quality monitoring/prediction for real Internet VoIP traces, perceived quality driven playout buffer optimization and perceived quality driven QoS control. The neural network and regression models were both used to predict voice quality for real Internet VoIP traces based on international links. A new adaptive playout buffer and a perceptual optimization playout buffer algorithms are presented. A QoS control scheme that combines the strengths of rate-adaptive and priority marking control schemes to provide a superior QoS control in terms of measured perceived voice quality is also provided. (4) a new methodology for Internet-based subjective speech quality measurement which allows rapid assessment of voice quality for VoIP applications is proposed and assessed using both objective and traditional MOS test methods.