论文部分内容阅读
许多机器学习的实际应用中都存在数据不平衡问题,即某类的样本数目要远小于其他类别.数据不平衡会使得分类问题中的分类面过于倾向于适应大类而忽略小类,导致测试样本被错误地判断为大类.针对该问题,文章提出了一种平衡化图半监督学习方法.该方法在能量函数中引入均衡化因子项,使得置信值不仅在图上尽量光滑且在不同类别之间也尽量均衡,有效减小了数据不均衡的不利影响,21个标准数据集上对比实验的统计分析结果表明新方法在数据不平衡时具有显著(显著性水平为0.05)优于支持向量机以及其他图半监督学习方法的分类效果.
In many practical applications of machine learning, there is a problem of data imbalance, that is, the number of samples of a certain type is much smaller than that of other categories. The data imbalance can make the classification faces of classification problems tend to be adapted to large categories and neglect the small classes, In order to solve the problem, a balanced map semi-supervised learning method is proposed in this paper, which introduces an equalization factor into the energy function so that the confidence value is not only as smooth as possible but different in the graph The categories are also balanced as much as possible to effectively reduce the adverse effects of data imbalance. The statistical analysis of comparative experiments on 21 standard datasets shows that the new method is significantly better at the data imbalance (0.05 significance level) than the support Vector machines and other charts semi-supervised learning methods classification results.