论文部分内容阅读
针对小样本情况下,使用混合概率线性回归(Mixture of Probabilistic Linear Regressions,MPLR)模型进行语音转换容易出现过拟合的问题,提出利用动态核特征替代源说话人语音谱特征后,对转换函数参数进行贝叶斯最大后验估计(Maximum a Posterior,MAP)求解的方法。首先采用核函数将源说话人的语音谱特征转化为动态核特征,再引入转换函数参数的先验知识,最后根据对误差的不同假设,提出两种求解转换函数参数的方法。客观评测实验结果表明,所提出方法的平均谱失真值相对于MPLR模型转换方法平均降低了4.25%。主观评测实验结果表明,所提出的方法在转换语音的相似度和自然度方面的得分均高于MPLR方法。实验结果证明,所提出方法有效地改善了语音转换中的过拟合问题。
In the case of small samples, the problem of over-fitting is prone to occur when using the Mixture of Probabilistic Linear Regressions (MPLR) model for speech conversion. After the speech feature of the source speaker is replaced by the dynamic kernel feature, the conversion function parameters The Bayesian Maximum a Posterior (MAP) method is used to solve the problem. Firstly, the kernel function is used to convert the speech speaker’s speech spectral features into dynamic kernel features, and then the prior knowledge of the transfer function parameters is introduced. Finally, two different methods for solving the transfer function parameters are proposed according to different assumptions about the error. The results of objective evaluation show that the average spectral distortion of the proposed method is reduced by 4.25% on average compared with the MPLR model transformation method. The subjective evaluation results show that the proposed method has higher scores in terms of the similarity and naturalness of the transformed speech than the MPLR method. Experimental results show that the proposed method effectively improves the over-fitting problem in speech conversion.