论文部分内容阅读
针对传统语音唇动分析模型容易忽略唇动帧间时变信息从而影响一致性判别结果的问题,提出一种基于平移不变学习字典的一致性判决方法.该方法将平移不变稀疏表示引入语音唇动一致性分析,通过音视频联合字典学习算法训练出时空平移不变的音视频字典,并采用新的数据映射方式对学习算法中的稀疏编码部分进行改进;利用字典中的音视频联合原子作为描述不同音节或词语发音时音频与唇形同步变化关系的模板,最后根据这种模板制定出语音唇动一致性评分判决准则.对四类音视频不一致数据的实验结果表明:本方法与传统统计类方法相比,对于少音节语料,总体等错误率(EER)平均从23.6%下降到11.3%;对于多音节语句,总体EER平均从22.1%下降到15.9%.
Aiming at the problem that traditional lip analysis model can ignore the time-varying information of lip-motion frames and thus affect the consistency of discriminant results, a consistent decision method based on translation invariant learning dictionary is proposed. The proposed method invokes translation invariant sparse representation into speech Lip-motion consistency analysis, through audio and video combined with dictionary learning algorithm training time and frequency translation of the same audio and video dictionary, and the use of new data mapping methods to improve the sparse coding part of the learning algorithm; the dictionary of audio and video combined with atoms As a template to describe the relationship between audio and lip synchronization when different syllables or words are pronounced, finally, a judgment criterion of speech lip consistency score is established according to this template.Experimental results on four kinds of audio and video inconsistent data show that this method is similar to the traditional Compared with the statistics method, the overall error rate (EER) decreased from 23.6% to 11.3% for the syllable corpus, while the overall EER decreased from 22.1% to 15.9% for the multi-syllable sentence on average.