论文部分内容阅读
文本分类是处理和组织文本信息的关键技术,能够帮助有效地组织信息,快速区分有效信息和无用信息,满足用户的个性化需求。本文主要介绍了文本分类的背景、国内外的研究现状以及利用机器学习方法解决文本分类问题的一般步骤。文章第二部分对中文分词、特征向量提取、分类器训练和评估原理做了重点介绍,包括了自然语言处理的统计语言模型,机器学习的KNN、SVM、神经网络算法。
Text categorization is the key technology to process and organize textual information, which can help organize information effectively, quickly distinguish between effective information and useless information, and meet the individual needs of users. This paper mainly introduces the background of text classification, the research status at home and abroad, and the general steps of using machine learning method to solve the text classification problem. The second part of the article focuses on Chinese word segmentation, feature vector extraction, classifier training and evaluation principles, including the statistical language model of natural language processing, machine learning KNN, SVM, neural network algorithm.