论文部分内容阅读
本文选取癌症基因组图谱数据库的乳腺癌样本作为数据集,在全基因组的水平上研究乳腺癌病人从正常到发病Ⅰ期基因表达的变化,寻找与乳腺癌发病密切相关的特征基因,建立乳腺癌发生的模式识别分类方法,为乳腺癌预防及早期诊断提供理论支持.研究中,综合利用相关性、t检验、置信区间等统计学方法,建立乳腺癌发生特征基因筛选方法,获得与乳腺癌发生具有显著性差异的特征基因336个.通过机器学习方法建模,得到的分类准确率能达到98%以上,与之前乳腺癌相关的研究相比,准确率更高.同时采用KEGG(kyoto encyclopedia of genes and genomes)通路分析得到与基因显著相关(P<0.05)的通路有8个,GO(gene ontology)基因功能富集分析显示与基因显著相关(P<0.05)的功能有18个.最后对映射在8个通路中的一部分基因进行简要功能分析,说明了其在调控水平上的密切关系,表明识别的特征基因在乳腺癌的发生过程中有重要的作用,这对了解乳腺癌发病机理以及乳腺癌的早期诊断非常重要.
In this study, breast cancer samples of cancer genome database were selected as data sets to study the changes of gene expression in stage Ⅰ of breast cancer patients from normal to disease-oriented at the genome-wide level to find out the characteristic genes closely related to the pathogenesis of breast cancer and to establish breast cancer occurrence , Provide theoretical support for the prevention and early diagnosis of breast cancer.In the study, the comprehensive screening of characteristic genes of breast cancer by statistical methods such as correlation, t-test, confidence interval and so on, The number of significant differences in the characteristics of the genes 336. By machine learning method modeling, the classification accuracy can be achieved more than 98%, compared with the previous breast cancer-related studies, the accuracy rate higher.Meanwhile, KEGG (kyoto encyclopedia of genes There were 8 pathways that were significantly correlated with genes (P <0.05), and gene enrichment analysis of GO (gene ontology) showed 18 genes that were significantly associated with genes (P <0.05). Finally, A brief functional analysis of a portion of the genes in the eight pathways revealed their close relationship in regulation and control, indicating that the identity of the recognition Because of the important role of breast cancer in the process, which is very important for understanding the pathogenesis of breast cancer and the early diagnosis of breast cancer.