论文部分内容阅读
针对说话人识别系统的噪声鲁棒性问题,文章采用了基于信噪比估计从而选取高信噪比语音帧的前端处理方法。根据一定阈值去除语音中信噪比较低的语音帧,对保留的语音帧提取特征参数并进行识别。该方法的有效性取决于分帧信噪比估计的准确性。由于传统的谱减法以及滤波法难以对非平稳噪声信噪比进行准确估计,文中提出了使用改进的最小值控制递归平均算法进行信噪比估计来实现高信噪比帧筛选,实验结果表明,与基于维纳滤波语音增强的GMM-UBM系统对比,5dB street噪声下识别率由78.5%提升至85.5%,5dB car噪声下识别率由88%提升至91%。
Aiming at the problem of noise robustness of speaker recognition system, this paper adopts a front-end processing method based on signal-to-noise ratio (SNR) estimation to select high SNR frames. According to a certain threshold, the voice frame with low SNR is removed and the feature parameters are extracted and identified. The effectiveness of this method depends on the accuracy of the frame-by-frame SNR estimation. Because traditional spectral subtraction and filtering methods are difficult to accurately estimate the signal-to-noise ratio (SNR) of non-stationary noise, the paper proposes an improved minimum-value control recursive averaging algorithm for SNR estimation to achieve high SNR frame selection. The experimental results show that, Compared with the GMM-UBM system based on Wiener-filtered speech enhancement, the recognition rate increased from 78.5% to 85.5% at 5dB street noise, and from 88% to 91% at 5dB car noise.