论文部分内容阅读
正音反馈的计算机辅助对外汉语发音训练系统已有发音偏误趋势的标注体系和基于HMM的偏误趋势检测系统。为了进一步提高系统的性能,该文应用深度神经网络进行声学建模,比较Mel频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)、感知线性预测分析系数(perceptual linear predictive analysis,PLP)和Mel滤波器组系数(Mel filter bank,FBank)3种声学特征参数,并利用网格联合技术整合3种声学特征所得的候选网格。实验结果表明:DNN-HMM模型比GMM-HMM实现了更高检测正确率。针对不同发音偏误趋势,3种声学特征有不同表现,联合系统取得最高性能,最终性能为:错误拒绝率5.5%,错误接受率35.6%,检测正确率88.6%。
The pronunciation feedback system of computer aided Chinese pronunciation training system has the annotation system of the tendency of pronunciation errors and the HMM-based error trend detection system. In order to further improve the performance of the system, a deep neural network is used for acoustic modeling. Mel-frequency cepstral coefficients (MFCC), perceptual linear predictive analysis (PLP) and Mel filtering Three kinds of acoustic characteristic parameters of Mel filter bank (FBank) were selected, and the grid of three kinds of acoustic features was integrated by grid joint technology. The experimental results show that DNN-HMM model achieves higher detection accuracy than GMM-HMM. According to the trend of different pronunciation errors, the three kinds of acoustic characteristics have different performances. The joint system achieves the highest performance with the final false rejection rate of 5.5%, false acceptance rate of 35.6% and detection accuracy rate of 88.6%.