A modified One-Class-One-Network ANN architecture for dynamic phoneme adaptation
As computers begin to pervade aspects of our everyday lives, so the problem of communication from man-to-machine becomes increasingly evident. In recent years, there has been a concerted interest in speech recognition offering a user to communicate freely with a machine. However, this deceptively simple means for exchanging information is in fact extremely complex. A single utterance can contain a wealth of varied information concerning the speaker's gender, age, dialect and mood. Numerous subtle differences such as intonation, rhythm and stress further add to the complexity, increasing the variability between inter- and intra-speaker utterances. These differences pose an enormous problem, especially for a multi-user system since it is impractical to train for every variation of every utterance from every speaker. Consequently adaptation is of great importance, allowing a system with limited knowledge to dynamically adapt towards a new speakers characteristics. A new modified artificial neural network (ANN) was proposed incorporating One-Class-OneNetwork (OCON) subnet architectures connected via a common front-end adaptation layer. Using vowel phonemes from the TIMIT speech database, the adaptation was concentrated on neurons within the front-end layer, resulting in only information common to all classes, primarily speaker characteristics, being adapted. In addition, this prevented new utterances from interfering with phoneme unique information in the corresponding OCON subnets. Hence a more efficient adaptation procedure was created which, after adaptation towards a single class, also aided in the recognition of the remaining classes within the network. Compared with a conventional multi-layer perceptron network, results for inter- and intraspeaker adaptation showed an equally marked improvement for the recognition of adapted phonemes during both full neuron and front-layer neuron adaptation within the new modified architecture. When testing the effects of adaptation on the remaining unadapted vowel phonemes, the modified architecture (allowing only the neurons in the front-end layer to adapt) yielded better results than the modified architecture allowing full neuron adaptation. These results highlighted the storing of speaker information, common to all classes, in the front-end layer allowing efficient inter- and intra-speaker dynamic adaptation.