论文部分内容阅读
研究韵律特征在说话人确认中的应用。将整个韵律轨迹以固定段长和段移进行片段划分,并对其进行勒让德多项式拟合从而获取连续性的韵律特征,将特征映射到总变化因子空间,并用概率线性判别分析来补偿说话人和场景的差异。在美国国家标准技术研究院2010年说话人识别评测扩展核心测试集5的基础上加入噪声构造测试集,并分别对韵律特征和传统Mel频率倒谱系数进行测试。结果显示,随着信噪比的逐渐减小,Mel频率倒谱系数性能出现大幅度下降,而韵律特征性能相对比较稳定,两种特征融合后能使系统性能得到进一步提升,等错率和最小检测错误代价相对于Mel频率倒谱系数单系统最多能分别下降9%和11%。实验表明,韵律特征应用于说话人识别中具有较强的噪声鲁棒性,且与传统的Mel频率倒谱系数存在较强的互补性。
Application of Prosodic Features in Speaker Verification. The whole prosodic trajectory is divided into segments by fixed segment length and segment shift, and Legendre polynomial fitting is performed to get the prosodic feature of continuity. The feature is mapped to the space of total variation factor, and the probability linear discriminant analysis is used to compensate for the speech Differences between people and scenes. Based on the 2010 National Institute of Standards and Technology’s speaker verification evaluation extended core test set 5, noise construction test sets were added, and the prosodic features and traditional Mel frequency cepstrum coefficients were tested respectively. The results show that with the gradual decrease of signal-to-noise ratio, the performance of Mel frequency cepstrum coefficient decreases significantly, while the prosodic feature performance is relatively stable. The fusion of the two features can further improve the system performance with the equal error rate and minimum The cost of detection error can be reduced by up to 9% and 11%, respectively, relative to the Mel-Frequency Cepstral Coefficients single-system. Experiments show that prosodic features have strong robustness to noise in speaker recognition, and have strong complementarity with traditional Mel frequency cepstrum coefficients.