论文部分内容阅读
基于Internet的信息挖掘是数据挖掘技术中的重要组成部分,也是网络信息处理领域中的一项新课题。本文介绍了Internet上的电子文档信息自动挖掘的概念和系统的体系结构,并给出了文档结构图解析、文档分类检索等电子文档自动挖掘的预处理过程及处理程序。
Internet-based information mining is an important part of data mining technology and a new topic in the field of network information processing. This paper introduces the concept of automatic mining of electronic documents on the Internet and the system architecture. It also gives preprocessing and processing procedures for the automatic mining of electronic documents such as document structure analysis, document classification and retrieval.