论文部分内容阅读
对网上中文信息语料库搜集技术的实现原理和关键技术进行了讨论和分析,介绍了基于Web网络的通讯及网上自动获取信息的原理,讨论了中文信息处理中的分词技术及其发展,提出了一个网上《人民日报》语料库搜集技术的实现方案.
This paper discusses and analyzes the realization principle and key technologies of Chinese information corpus collection technology, introduces the principle of communication based on Web-based network and automatic information acquisition on the Internet, discusses the segmentation technology and its development in Chinese information processing, and proposes a Realization of the Acquisition Technology of the Corpus of “People ’s Daily” Online.