论文部分内容阅读
在说话人确认中,通常采用的声学特征(如MFCC,PLP特征等)包含的主要是文本信息和信道信息,说话人信息属于其中的弱信息,极易受到语音信号中的文本信息及信道、噪声等干扰的影响.针对这个问题,提出一种基于深度神经网络提取语音信号中说话人特征的方法,该方法用语音识别深度神经网络各个隐层非线性输出值来提取说话人特征.在RSR2015数据库上开展了GMM-UBM文本无关和文本相关说话人确认实验,实验结果表明本文方法提取的特征相对于传统的MFCC特征,系统等错误率(Equal Error Rate,EER)有了明显的下降.
In the speaker verification, the commonly used acoustic features (such as MFCC, PLP features, etc.) mainly contain text information and channel information. The speaker information belongs to the weak information therein and is highly susceptible to the text information and channel in the speech signal. Noise and so on.Aiming at this problem, this paper proposes a method of extracting speaker features from speech signals based on depth neural network, which extracts the speaker features by speech recognition of the non-linear output values of each hidden layer of neural networks.After RSR2015 GMM-UBM text-independent speaker verification experiments are carried out on the database. The experimental results show that the features extracted by this method have a significantly lower error rate (Equal Error Rate, EER) than the traditional MFCC features.