论文部分内容阅读
提出了基于互联网海量数据的热点信息系统的设计方案,通过采用网络爬虫、相似性去重、关键词提取和摘要生成等技术,实现了信息获取及热点信息分析功能。系统测试结果表明所设计的系统能够有效地实现热点信息自动提取和分析的功能,提高了热点信息分析效率。
The design scheme of hot spot information system based on mass data of the Internet was proposed. By using techniques such as web crawler, similarity deduplication, keyword extraction and digest generation, the information acquisition and hot spot information analysis function were realized. The system test results show that the designed system can effectively extract and analyze hot spot information effectively and improve the efficiency of hot spot information analysis.