论文部分内容阅读
卵巢癌是一种常见的妇科肿瘤,死亡率占各类妇科肿瘤的首位。选取既有较高的分类疾病模式能力又具有生物学关联的特征肿瘤标志物用于肿瘤的诊断是目前研究的重点。本研究针对卵巢癌磷脂代谢物数据的问题,提出了一种融合有监督奇异值分解和基于信息增益的随机森林决策的方法用于特征标志物的选择。首先应用有监督奇异值分解计算各标志物的权重值,并根据权重值粗选出候选标志物;其次应用基于信息增益的随机森林决策理论从候选标志物中选出特征标志物;最后通过SVM分类器测试,分类率高达90%以上。本研究方法与其他常用方法比较具有一定优势,其中一个明显的特点是所选特征标志物不但保持了较高的分类率,而且具有生物学关联意义,从而证实本研究方法具有较高的可行性和实用性。
Ovarian cancer is a common gynecological cancer, mortality accounted for the first of all types of gynecological tumors. It is the focus of the current research to select tumor markers that have both higher taxonomic capacity and biological relevance. In this study, aiming at the problem of phospholipid metabolite data in ovarian cancer, a method combining the supervised singular value decomposition and random forest decision based on information gain was proposed for the selection of characteristic markers. Firstly, we use the supervised singular value decomposition to calculate the weight of each marker, and choose the candidate marker according to the weight value. Secondly, we use the random forest decision theory based on information gain to pick out the marker from the candidate markers. Finally, Classifier test, the classification rate as high as 90%. Compared with other commonly used methods, this method has some advantages. One of the obvious features is that the selected markers not only maintain a high classification rate, but also have biological relevance to confirm that this method is feasible And practicality.