论文部分内容阅读
为了研究语音情感与语谱图特征间的关系,本文研究并提出一种面向语音情感识别的改进可辨别完全局部二值模式特征。首先,基于语谱图灰度图像,计算图像的完全局部二值符号模式(CLBP_S)、幅度模式(CLBP_M)的统计直方图。然后,将CLBP_S,CLBP_M统计直方图输入可区别特征学习模型中,训练得到全局显著性模式集合。最后,采用全局显著性模式集合对CLBP_S,CLBP_M直方图进行处理,将处理后的特征级联,得到面向语音情感识别的改进可辨别完全局部二值模式特征(IDisCLBP_SER)。基于柏林库、中文情感语音库的语音情感识别实验显示,IDisCLBP_SER特征召回率比纹理图像信息(TII)等特征提高了8%以上,比声学频谱特征平均提高了4%以上。而且,本文提出的特征可以和现有声学特征进行较好融合,融合后的特征召回率比现有声学特征召回率提高1%~4%。
In order to study the relationship between speech emotion and speech spectrum features, this paper researches and proposes a new feature of speech recognition based on improved discernible complete local binary pattern. First of all, based on the spectrogram grayscale image, the histogram of the complete local binary sign mode (CLBP_S) and the amplitude mode (CLBP_M) of the image is calculated. Then, the CLBP_S and CLBP_M statistical histograms are input to the distinguishable feature learning model and trained to obtain the global saliency pattern set. Finally, the CLBP_S and CLBP_M histograms are processed using the set of global saliency patterns, and the processed features are concatenated to obtain the IDisCLBP_SER for speech emotion recognition. Speech emotional recognition experiments based on Berlin library and Chinese emotional speech database show that the ID recall rate is more than 8% higher than that of texture image information (TII), and more than 4% higher than that of acoustic spectrum. Moreover, the features proposed in this paper can be well integrated with the existing acoustic features, and the recall rate after fusion is 1% ~ 4% higher than that of the existing acoustic features.