论文部分内容阅读
针对监督机器学习方法抽取实体关系受限于标注语料的规模问题,提出采用信息熵方法来不断扩展小规模训练数据的半监督领域实体关系抽取。结合领域词汇选取小规模训练数据,构建了一定准确率的初始最大熵分类器,用来从未标记数据中预测出候选新实例。采用信息熵方法,通过设定不同熵值,多次循环以选取可信度较高的新实例来扩展训练数据。使用扩展后的训练数据重新迭代训练分类器,分类器性能趋于稳定迭代终止,实现了半监督学习的领域实体关系抽取。实验表明,和已有方法相比,本文提出的半监督领域实体关系抽取通过结合信息熵方法,在小规模标注样本环境中取得了较好的学习效果。
Aiming at the problem that the entity relationship of supervised machine learning method is restricted to the scale of annotation corpus, this paper proposes the extraction of entity relationship in the semi-supervised area by using information entropy method to expand the small-scale training data continuously. Combining domain words to select small-scale training data, an initial maximum entropy classifier with a certain accuracy rate is constructed to predict candidate new instances from unlabeled data. The information entropy method is adopted to expand training data by setting different entropy values and multiple cycles to select a new instance with high credibility. Using the extended training data to iteratively train the classifier, the performance of the classifier tends to be stable and iteratively terminated, thus realizing the real-world relationship extraction of the semi-supervised learning. Experiments show that, compared with the existing methods, the proposed entity relationship extraction in the semi-supervised domain achieves good learning results in a small-scale annotation sample environment by combining the information entropy method.