论文部分内容阅读
特征选择是机器学习中重要的数据预处理步骤,它从原始特征集合中,选择一个重要的子集,以改进学习系统的性能或降低学习系统的计算复杂度,对学习系统的性能有重要的影响.针对离散值特征选择问题,提出一种基于遗传算法的特征选择方法.该方法利用遗传算法搜索最优或次优特征子集.具体地,利用二进制数对问题的解编码,利用不一致性度量作为适应度函数.实验结果显示本文提出的特征选择方法是行之有效的.提出的方法具有如下三个特点:1)简单且易于实现;2)测试精度较高;3)可解释性强.
Feature selection is an important data preprocessing step in machine learning. It selects an important subset from the original feature set to improve the performance of the learning system or reduce the computational complexity of the learning system, which is important for learning the performance of the system Aiming at the problem of feature selection of discrete values, a feature selection method based on genetic algorithm is proposed, which uses genetic algorithm to search the optimal or suboptimal feature subsets.In particular, using binary numbers to solve the problem, The results show that the method proposed in this paper is effective.The proposed method has the following three characteristics: 1) simple and easy to implement; 2) high test accuracy; 3) strong interpretability .