论文部分内容阅读
The perception of human languages is inherently a multi-modal process, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, - 8 dB, - 12 dB, and - 16 dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably
The perception of human languages is inherently a multi-modal process, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu /), extracted from Chinese audiovisual bimodal speech database The audio-only stimuli, and audio-only stimuli were identified by 20 observers. CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual- The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio informa can remarkably