论文部分内容阅读
微博作为一种新兴的网络社交服务,其即时通讯功能强大,用户可利用各种手段在微博上实时、快捷地发布社会热点事件.但是微博平台在短时间内发布大量信息的特点在一定程度上造成了信息的碎片化,而且迅速的信息更新速度易造成重要信息的不易检索.本文采用Hadoop平台,利用其在大数据挖掘方面的优势,提出挖掘微博中热点词的分布式算法,提取热点词组织热点事件,方便用户查询.此外提出了线性时间复杂度的检测算法,检测热点事件的爆发时间段.文中采用Twitter和新浪微博上的数据集作为测试样本,进行了大量的实验,实验结果表明本文算法能有效的提取微博中的热点事件.
As an emerging online social service, microblogging is powerful in instant messaging, and users can use various means to publish social hot events in real time and quickly on Weibo, but the microblogging platform is characterized by a large amount of information released in a short period of time The fragmentation of information has been caused to a certain extent, and the speed of updating information can easily lead to the retrieval of important information.This paper uses the Hadoop platform to make use of its advantages in big data mining and proposes a distributed algorithm for mining hot words in Weibo , Hot spots are extracted hot spots, user-friendly query.In addition, a linear time complexity detection algorithm is proposed to detect hot spots during the explosion.This paper uses the data sets on Twitter and Sina microblogging as a test sample, a large number of Experiments and experimental results show that our algorithm can effectively extract the hot events in Weibo.