Pitch estimation for noisy speech
In this dissertation a biologically plausible system of pitch estimation is proposed. The system is designed from the bottom up to be robust to challenging noise conditions. This robustness to the presence of noise in the signal is achieved by developing a new representation of the speech signal, based on the operation of damped harmonic oscillators, and temporal mode analysis of their output. This resulting representation is shown to possess qualities which are not degraded in presence of noise. A harmonic grouping based system is used to estimate the pitch frequency. A detailed statistical analysis is performed on the system, and performance compared with some of the most established and recent pitch estimation and tracking systems. The detailed analysis includes results of experiments with a variety of noises with a large range of signal to noise ratios, under different signal conditions. Situations where the interfering "noise" is speech from another speaker are also considered. The proposed system is able to estimate the pitch of both the main speaker, and the interfering speaker, thus emulating the phenomena of auditory streaming and "cocktail party effect" in terms of pitch perception. The results of the extensive statistical analysis show that the proposed system exhibits some very interesting properties in its ability of handling noise. The results also show that the proposed system’s overall performance is much better than any of the other systems tested, especially in presence of very large amounts of noise. The system is also shown to successfully simulate some very interesting psychoacoustical pitch perception phenomena. Through a detailed and comparative computational requirements analysis, it is also demonstrated that the proposed system is comparatively inexpensive in terms of processing and memory requirements.