论文部分内容阅读
In the literature of voice conversion(VC),the method based on statistical Gaussian mixture model(GMM)serves as a benchmark.However,one of the inherent drawbacks of GMM is well-known as discontinuity problem,which is caused by transforming features on a frame-by-frame basis,thus ignoring the dynamics between adjacent frames and fnally resulting in degraded quality of the converted speech.A variety of algorithms have been proposed to overcome this defciency,among which the state space model(SSM)based method provides some promising results.In this paper,we proceed by presenting an enhanced version of the traditional SSM,namely,the switching SSM(SSSM).This new structure is more flexible than the conventional one in that it allows using mixture of components to account for the rapid transitions between neighboring frames.Moreover,physical meaning of the model parameters of SSSM has been examined in depth,leading to efcient application-specifc training and transforming procedures of VC.Experiments including both objective and subjective measurements were conducted to compare the performances of the conventional and the proposed SSM-based methods,which have convinced that obvious improvements in both aspects of similarity and quality can be obtained by SSSM.
In the literature of voice conversion (VC), the method based on statistical Gaussian mixture model (GMM) serves as a benchmark. Yet, one of the underlying drawbacks of GMM is well-known as discontinuity problem, which is caused by transforming features on a frame-by-frame basis, thus ignoring the dynamics between adjacent frames and fnally resulting in degraded quality of the converted speech. A variety of algorithms have been proposed to overcome this defciency, among which the state space model (SSM) based method provides some of the results.In this paper, we proceed by presenting an enhanced version of the traditional SSM, namely, the switching SSM (SSSM) .This new structure is more flexible than the conventional one in that it allows using mixture of components to account for the rapid transitions between neighboring frames. Moreover, physical meaning of the model parameters of SSSM has been examined in depth, leading to efcient application-specifc training and transforming procedures of VC.Exper iments including both objective and subjective measurements were conducted to compare the performances of the conventional and the proposed SSM-based methods, which have convinced that obvious improvements in both aspects of similarity and quality can be obtained by SSSM.