论文部分内容阅读
URL是用于完整描述Internet上网页和其他资源地址的一种标识方法,URL访问日志能记录用户的上网痕迹。针对该特点,提出一种基于访问日志的网页内容监控挖掘系统,实现网页内容抓取、监控、分析、报表生成等一系列过程的自动化。系统运行测试结果表明,该系统的准确率较高,能有效解决运营商和互联网监管部门的网络监管问题。
URL is a complete description of the Internet and other resources on the Internet address of a logo, URL access log can record the user’s Internet traces. Aimed at this characteristic, this paper proposes a web content monitoring and mining system based on access logs, which can automate a series of processes such as web content crawling, monitoring, analysis and report generation. The system running test results show that the system has a high accuracy and can effectively solve the network supervision problems of operators and Internet regulators.