论文部分内容阅读
【目的】基于Sogou查询日志构建人工标注集,实现查询专指度的特征分析与自动识别,并对识别效果进行分析与评测。【方法】选取用户查询串基本特征与内容特征进行统计分析,并分别训练决策树、SVM和朴素贝叶斯分类器对专指度进行自动识别。【结果】使用以上特征的识别效果良好,十折交叉检验的宏平均F-measure均高于0.8。【局限】分类特征的选择未考虑用户点击信息;朴素贝叶斯的独立性假设在本实验中是否可以忽略仍需进一步验证。【结论】利用查询串基本特征和内容特征,可以有效识别弱、略和强专指度查询。
【Objective】 Based on Sogou query log, an artificial annotation set was constructed to realize the feature analysis and automatic recognition of query-only degree, and the recognition effect was analyzed and evaluated. [Method] The basic characteristics and content features of user query string were selected for statistical analysis. The decision tree and SVM and naive Bayesian classifier were respectively trained to identify the degree of specialization. 【Result】 The recognition results using the above features were good, and the average F-measure of the ten-fold cross test was higher than 0.8. [Limitations] The choice of classification features does not consider the user click information; naive Bayes independence hypothesis can be ignored in this experiment still need further verification. 【Conclusion】 With the basic features and content features of query strings, we can effectively identify weak, slightly and strongly-specific queries.