论文部分内容阅读
本文介绍了一个Web维文信息检索系统,此系统根据用户设定的主题对指定的网站进行信息检索。该系统采用在西文信息检索中非常成功的向量空间模型来解决维文信息检索的问题,在维文文档的特征项抽取,加权、相似度计算,模型的建立等方面做了一些探讨,提出了一种针对解决基于网络的维文信息处理(如:维文网页下载,网页内容信息的存储,以及维文检索)的方法。文中论述了系统的设计思想和相关的算法以及实现技术。
This paper introduces a Web-based text information retrieval system, which retrieves information from designated websites according to the topics set by users. The system uses the vector space model which is very successful in the retrieval of the western information to solve the problem of the retrieval of the text information. Some discussions are made on the feature extraction, weighting, similarity calculation and model establishment of the text document. A solution to the problem of web-based information processing (such as web download, web content information storage, and wye retrieval). This article discusses the system design ideas and related algorithms and implementation techniques.