论文部分内容阅读
为了在语音转换过程中充分提取语音的个人特征信息,同时考虑到语音的稀疏性,文章提出了一种基于稀疏卷积非负矩阵分解的语音转换方法。卷积非负矩阵分解得到的时频基可以承载语音信号中的个人特征信息及语音帧之间的相关性,而稀疏卷积非负矩阵分解得到的过完备时频基更能体现语音的细节,可以较好地保存语音中的个人特征信息。利用这一特点,通过稀疏卷积非负矩阵分解从训练数据中提取源说话人和目标说话人相匹配的过完备时频基,然后通过时频基的替换实现语音转换。相对于传统方法,该方法能够更好地保存语音个人特征信息和语音帧间相关性,从而可以进一步提高转换语音的质量和相似度。实验仿真及主、客观评价结果表明,与基于高斯混合模型、卷积非负矩阵分解的语音转换方法相比,该方法具有更好的转换语音质量和转换相似度。
In order to fully extract the personal characteristic information of speech during the process of speech conversion, taking into account the sparseness of speech, a speech conversion method based on sparse convolution nonnegative matrix factorization is proposed. The time-frequency basis obtained by convolution nonnegative matrix factorization can carry the personal characteristic information and the correlation between speech frames in speech signal, and the overcomplete time-frequency basis obtained by sparse convolution nonnegative matrix factorization can better reflect the details of speech , You can better preserve the personal characteristics of the voice information. Using this feature, sparse convolution nonnegative matrix factorization is used to extract the overcomplete time-frequency base that matches the source speaker and the target speaker from the training data, and then the speech conversion is achieved through the replacement of time-frequency basis. Compared with the traditional method, this method can better preserve the personal characteristics of speech and the correlation between speech frames, so as to further improve the quality and similarity of converted speech. The experimental simulation and the results of both objective and objective evaluation show that the proposed method has better quality of transformed speech and similarity of transform compared with speech transform based on Gaussian mixture model and convolution nonnegative matrix factorization.