论文部分内容阅读
Analysis of large-scale gene expression data is a research hotspot in the field of bioinformatics,which can be used to diagnose the disease of human and animal,and to study the abnormal phenomenon in plant growth process.This paper proposes a biological knowledge integration method based on parallel clustering to select gene subsets effectively.Gene ontology is utilized to obtain the biological function similarity,and combine it with gene expression data.Parallelized affinity propagation algorithm is used to cluster fusion data since it can not only obtain more biologically meaningful subsets,but also avoid the loss of some potential value in genes from simple gene primary selection.Based on clustering result,neighborhood rough set is used to select representative genes which are used to train classifier for each cluster.