论文部分内容阅读
[目的 /意义]明晰由关键词形成的主题内容类关联关系对合著关系预测的影响和作用,形成作者-关键词二分网络上的合著关系预测指标和方法,提高预测准确率和结果可解释性。[方法 /过程]首先,在作者-关键词二分网络上抽取多种路径表示作者间的关联关系,并结合关联强度的计算方式,共同形成多种合著关系预测指标;接着应用逻辑回归的机器学习方法学习不同指标对于合著关系预测的贡献,由此构建二分网络中基于路径组合的合著关系预测指标;最后基于链路预测方法对指标进行评测。[结果 /结论]在图书情报领域的实验证实,作者-关键词二分网络中路径组合指标的准确率最高,较4种单路径指标均有大幅度提高;多种路径均对合著关系预测产生影响,且路径“作者-关键词-作者”(AKA)的作用明显高于路径“作者-关键词-作者-关键词-关键词”(AKAKA);同时,使作者产生关联的关键词能表示作者间的共同研究主题和兴趣,使得结果更易解释。下一步将引入更多路径到该模型中并在其他领域验证方法的通用性。
[Purpose / Significance] To clarify the influence and effect of the relationship of the subject content categories formed by keywords on the prediction of co-relationships and to form the authors’ index and method of co-relationship prediction on the dichotomy of keywords and improve the prediction accuracy and result Explanatory. [Methods / Processes] Firstly, a variety of paths are extracted from the author-keyword dichotomy network to indicate the association among authors, and combined with the calculation of correlation strength to form a variety of predictive indicators of co-relationships; followed by the application of logistic regression Learning methods to learn the contribution of different indicators to the prediction of co-author relations, and then to construct co-relation prediction indexes based on path combinations in dichotomous networks. Finally, the indexes are evaluated based on link prediction methods. [Results / Conclusion] Experiments in the field of library and information services confirmed that the author - keyword dichotomy network with the highest accuracy rate of path combination indicators, compared with the four kinds of single-path indicators have greatly improved; a variety of paths are predicted co-occurrence relations Influence, and the path “author-key words-author” (AKA) is significantly higher than the path “author-key words-author-key words” (AKAKA); at the same time, Key words can represent the common research themes and interests among authors, making the results easier to interpret. The next step is to introduce more paths into the model and verify the versatility of the method in other areas.