论文部分内容阅读
提出一种短时频谱通用背景模型群与韵律参数相结合进行年龄语音转换的方法。谱参数转换方面,同一年龄段各说话者提取语音短时谱系数并建立高斯混合模型,然后依据语音特征相似性对说话者进行聚类,每一类训练一个通用背景模型,最终得到通用背景模型群和一组短时频谱转换函数。谱参数转换之后再对共振峰进一步微调。韵律参数转换方面,基频和语速分别建立单高斯和平均时长率模型来推导转换函数。实验结果显示,提出的方法在ABX和MOS等评价指标上比传统的双线性法有明显的优势,相对单一通用背景模型法的对数似然度变化率提高了4%。这一结果表明提出的方法能够使转换语音具有良好目标倾向性的同时有较好的语音质量,性能较传统方法有明显提升。
A new method of age-to-speech conversion combining short-term spectrum universal background model group and prosodic parameters is proposed. In terms of spectral parameter conversion, each speaker of the same age band extracts speech short-term spectral coefficients and establishes a Gaussian mixture model, then clusters the speakers according to the similarities of speech features. Each class trains a generic background model and finally obtains a universal background model Group and a set of short-time spectral conversion functions. Formant parameters are further fine tuned after spectral parameter conversion. In terms of rhythm parameter conversion, the single-Gaussian and average length-rate models were established for the fundamental frequency and speech rate respectively to derive the transfer function. The experimental results show that the proposed method has obvious advantages over the traditional bilinear method in the evaluation index such as ABX and MOS, and the rate of change of logarithmic likelihood is 4% higher than that of the single common background model. This result shows that the proposed method can achieve good speech intelligibility and good speech quality, and the performance is obviously improved compared with the traditional method.