论文部分内容阅读
语音识别算法中,动态时间规整(DTW)和隐马尔可夫模型(HMM)是最有效的识别算法,并且两者之间有着本质的联系和内在的统一[1],据此前期工作中,已经建立了DTW和HMM的统一模型(DHUM)[2、3]。本文对DHUM进行了改进,在DHUM中引进寂静段自环,并根据汉语语音的特点,提出了一种无端点检测的语音识别算法。在识别过程中,该算法无需确定语音信号起止点位置,而是从寂静段开始,直接按帧提取特征(帧长20ms,帧间重叠50%),特征向量由15阶倒谱系数和帧平均能量组成。实验中,用DHUM实现了该算法,对99个相似汉语单字的识别测试结果表明:无端点检测的识别正识率为94.95%,正识率下降很少,但不作端点检测却降低了算法的复杂程度。为进一步改善识别性能,特征向量采用一种听觉模型特征,识别器具有更好的鲁棒性,识别率会略有提高。
In speech recognition algorithms, dynamic time warping (DTW) and hidden Markov models (HMM) are the most effective recognition algorithms, and there is an essential connection and internal unity between them [1]. According to the previous work, A unified model of DTW and HMM has been established (DHUM) [2,3]. In this paper, we improve DHUM, introduce silence loop self-loop in DHUM, and propose a speech recognition algorithm without endpoint detection according to the characteristics of Chinese speech. Instead of determining the starting and ending positions of the speech signal, the algorithm extracts the features directly from the silence frame (frame length 20ms, 50% overlap between frames). The feature vector consists of 15-th order cepstral coefficient and frame average Energy composition. In the experiments, the algorithm was implemented by DHUM. The recognition test results of 99 similar Chinese words showed that the positive recognition rate of no-endpoint detection was 94.95%, while the positive recognition rate decreased very little, but the endpoint detection was decreased The complexity of the algorithm. To further improve the recognition performance, the eigenvector adopts an auditory model feature. The identifier has better robustness and the recognition rate will be slightly improved.