论文部分内容阅读
以微博为代表的社会媒体影响力越来越广泛深远.为有效预测微博热点并进行导控,首先批量获取经中文分词处理后的数据,以用户转发、评论、点赞次数为代表做数据预处理,通过相似度比较选出最佳微博填充矩阵模型.其次通过多种回归分析的比较与实践,用逐步回归法确定微博热点影响因子,进而用多元回归预测模型建立预测模型方程,计算精确度、准确率、召回率,并确定阈值.实验证明该预测模型能够保持较好的准确率,并可通过选择合适的阈值,进一步提升精确度.
The influence of social media represented by Weibo has become more and more widespread and far-reaching.In order to effectively forecast the hotspot of Weibo and conduct the control, we firstly obtain the data after Chinese word segmentation processing in batches, and use the number of users’ forwarding, commenting and praise as representatives Data preprocessing to select the best microblogging matrix model through similarity comparison.Secondly, through the comparison and practice of multiple regression analysis, stepwise regression method is used to determine the impact factors of Weibo hotspots, and then the predictive model equation is established by multiple regression prediction model , Calculate the accuracy, accuracy, recall, and determine the threshold.Experiments show that the prediction model can maintain a good accuracy, and can further improve the accuracy by selecting the appropriate threshold.