论文部分内容阅读
提出一种潜在属性空间树分类器(latent attribute space tree classifier,简称LAST)框架,通过将原属性空间变换到更容易分离数据或更符合决策树分类特点的潜在属性空间,突破传统决策树算法的决策面局限,改善树分类器的泛化性能.在LAST框架下,提出了两种奇异值分解斜决策树(SVD(singular value decomposition)oblique decision tree,简称SODT)算法,通过对全局或局部数据进行奇异值分解,构建正交的潜在属性空间,然后在潜在属性空间内构建传统的单变量决策树或树节点,从而间接获得原空间内近似最优的斜决策树.SODT算法既能够处理整体数据与局部数据分布相同或不同的数据集,又可以充分利用有标签和无标签数据的结构信息,分类结果不受样本随机重排的影响,而且时间复杂度还与单变量决策树算法相同.在复杂数据集上的实验结果表明,与传统的单变量决策树算法和其他斜决策树算法相比,SODT算法的分类准确率更高,构建的决策树大小更稳定,整体分类性能更鲁棒,决策树构建时间与C4.5算法相近,而远小于其他斜决策树算法.
A latent attribute space tree classifier (LAST) framework is proposed, which breaks through the traditional decision tree algorithm by transforming the original attribute space into the potential attribute space that is easier to separate data or more suitable for the classification of decision tree In order to improve the generalization performance of tree classifiers, two kinds of SVD (singular value decomposition) oblique decision tree (SVDT) algorithms are proposed under the framework of LAST. By analyzing the global or local data The SVD algorithm can deal with the whole potential problem by constructing singular value decomposition and constructing the orthogonal potential attribute space, and constructing the traditional univariate decision tree or tree node in the potential attribute space to indirectly obtain the approximate optimal oblique decision tree in the original space Data and local data distribution of the same or different data sets, but also can take full advantage of tagged and unlabeled data structure information, the classification results are not affected by the sample random rearrangement, and the time complexity is also the same with the univariate decision tree algorithm. The experimental results on complex data sets show that compared with the traditional univariate decision tree algorithm and other oblique decision trees Compared to a higher classification accuracy SODT algorithm to construct a decision tree size is more stable, more robust performance of the overall classification, decision tree algorithm C4.5 time and build close, but far less than other oblique decision tree algorithm.