Title:
|
Effects of visual degradation on audio-visual speech perception
|
Audio-visual speech recognition is considered to be a dynamic process that uses
auditory and complementary visual speech cues. These are the products of the stream
of timed and targeted movements of the articulators in the vocal tract used to produce
speech. If the visual aspect of speech is absent or degraded, speech recognition in noise
may deteriorate; this was used as a tool to investigate the visual aspect of speech
recognition in the following experiments.
A series of shadowing and recall experiments assessed the effects of frame rate
(temporal) and greyscale level (spatial) variations to the visual aspect of audio-visual
presentations of sentences spoken in noisy backgrounds by three, evenly illuminated,
speakers. There was a significant decline in shadowing accuracy as the frame rate of
presentation fell that was related to the importance of temporal synchrony in audiovisual
speech.
Shadowing and recall experiments, with recordings from one speaker in two
illumination conditions and two greyscale levels, revealed that performance accuracy
depended on level of illumination in both tasks, for the audio-visual experimental
condition and the audio-alone control condition. Moreover in poor illumination, there
was significantly less accurate recall performance at the lower greyscale level. This
was related to level of spatial facial information that may be used in speech recognition.
Shadowing and recall accuracy of sentence's keywords was related to their degree of
visible speech-related movement. Audio-visual shadowing accuracy varied little across
the range of movements, but audio-alone shadowing accuracy declined significantly as
the degree of movement increased. Visual and auditory target characteristics of words
associated with differing audio-visual advantage and degrees of visual movement were
determined. The findings were considered in the context of a dynamic model of speech
processing, which is dependent on patterns of the timings and targets of the auditory
and visual speech signals.
|