论文部分内容阅读
为了改善文本聚类的准确度,提出用基于主题概念子空间的模糊c-均值聚类(TCS2FCM)方法来分类文本.采用5个评估函数的加权值来提取关键短语;利用WordNet对相应的关键短语提取概念短语并生成最后的类别描述.初始中心和初始隶属度矩阵的建立是决定模糊c-均值聚类效果的关键,使用能够代表文本主题的概念短语来建立相互正交的主题概念子空间,利用主题子空间中的概念向量来初始化聚类中心和隶属度矩阵.实验结果表明:不同于传统模糊c-均值聚类的随机化初始,与文本内容相关的初始化有助于改进最后的聚类结果,提高聚类精度.
In order to improve the accuracy of text clustering, this paper proposes to classify the texts based on Fuzzy C-Means Clustering (TCS2FCM) based on the concept subspace of concepts, extract the key phrases using the weighted values of five evaluation functions, Key Phrases Extract conceptual phrases and generate final category descriptions. The establishment of initial centers and initial membership matrixes is the key to determining the effect of fuzzy c-means clustering. Conceptual phrases that can represent text topics are used to construct mutually orthogonal topic concepts Space and initialize the clustering center and membership matrix by using the concept vectors in the topic subspace.The experimental results show that the initialisation different from the traditional fuzzy c-means clustering and the text content-related initialization help to improve the final Clustering the results to improve the clustering accuracy.