论文部分内容阅读
【目的】针对微博情感分类时未标注样本多和已标注集少的问题,提出一种新的方法。【方法】在协同训练算法的基础上引入主动学习思想,从低置信度样本中选取最有价值的、信息含量大的,提交标注,标注完后添加到训练集中,重新训练分类器进行情感分类。【结果】使用不同的数据集进行实验,实验结果表明该方法所构建的分类器性能优于其他方法,分类准确率明显提高。特别是在已标注样本占40%的情况下,提升5%左右。【局限】在协同训练过程中使用随机特征子空间生成方法不能保证每次构建的两个分类器都是强分类器,因此未能充分地满足协同训练的假设条件。【结论】引入主动学习思想后,能够解决协同训练对低置信度样本处理的不足,进而增强分类器性能,提高分类准确率。
【Objective】 Aiming at the problem that there are not many annotated samples and fewer annotated sets in the emotional classification of Weibo, a new method is proposed. 【Method】 Based on the collaborative training algorithm, this paper introduces the idea of active learning, selects the most valuable and informative samples from the low confidence samples, submits the annotations, adds them to the training set after marking, and rechecks the classifiers for affective classification . 【Result】 Different datasets were used to carry out experiments. The experimental results show that the performance of the proposed method is superior to other methods and the classification accuracy is significantly improved. In particular, 40% of the marked samples have been upgraded by about 5%. [Limitations] The use of stochastic feature subspace generation methods in collaborative training can not guarantee that each of the two classifiers constructed each time is a strong classifier, thus failing to fully satisfy the assumptions of collaborative training. 【Conclusion】 The introduction of active learning can solve the shortcomings of cooperative training on low-confidence sample processing, enhance the performance of classifiers and improve the classification accuracy.