论文部分内容阅读
提出了层叠式“产生/判别”混合模型的语音情感识别方法。首先,提取63维语句级特征,运用Fisher从中选择12个最佳的语句级特征,建立小波神经网络(WNN)的层叠式产生式模型进行语音情感识别;然后提取69维帧级特征,采用SFS选择出待使用的8维特征,将高斯混合模型(GMM)进行多维概率输出,建立层叠式“产生/判别”混合模型进行语音情感识别。实验结果显示:(1)层叠式“产生/判别”混合模型较单独WNN、GMM、HMM(隐马尔可夫模型)、SVM(支持向量机)的识别率要高;(2)层叠式“产生/判决式”混合模型识别率较基于WNN的层叠产生式模型高;(3)M=13,D维GMM-MAP/SVM(MAP,最大后验概率)串联融合模型为最优的层叠式“产生/判别”混合模型,能获得最高85.1%的识别率。
This paper proposes a method of speech emotion recognition based on the mixed “generation / discrimination” hybrid model. Firstly, we extract 63-level sentence-level features and use Fisher to select the 12 best sentence-level features and establish a layered production model of WNN for speech emotion recognition. Then we extract 69-level frame-level features and use SFS The 8-dimensional feature to be used is selected, and the multi-dimensional probability output of the Gaussian Mixture Model (GMM) is output to create a layered “Generate / Discriminate” mixed model for voice emotion recognition. The experimental results show that: (1) the rate of recognition of cascading “generation / discrimination” hybrid model is higher than that of WNN, GMM, HMM and SVM alone; (2) (3) M = 13, the D-dimensional GMM-MAP / SVM (MAP) maximum likelihood model is the best The stacked “Generate / Discriminate ” hybrid model achieves a recognition rate of up to 85.1%.