论文部分内容阅读
Background: Pathway databases, especially KEGG, have been widely used as a reference knowledge base for biomedical scientists to interpret their experimental findings.Nevertheless, our knowledge about the existing biological pathways is incomplete and a large number of pathways have to be further expanded.Computational pathway expansion offers us a cheap and reliable way to take the challenging task.Several developed approaches, which rely on analysis of large-scale datasets generated by genome sequencing and other high-throughput experiments, are limited by across-studies variations and information provided by single experiments.In this study, we proposed a pathway expansion algorithm by systematic learning of functional knowledge bases (PPI and GO) for genes (their products) and their relations with others.Methods: In essence, pathway expansion is equivalent to test whether a target gene belongs to a specific pathway(s) or not.First, we identified all the interacting genes with the target gene, by using two large protein-protein interaction databases, HPRD and BioGRID.Then, we identified all the candidate KEGG pathways that these interacting genes belong to.Finally, for each candidate pathway, all the contained genes and the target gene were subject to enrichment analysis at each GO term of the target gene.We claimed that the target gene belongs to the candidate pathway if all GO terms of the target gene were enriched with the genes of this pathway.Results: The proposed knowledge-based approach achieved excellent performance in predicting a genes pathway, based on either of two PPI databases.The average consistent rate (defined as the proportion of the right predicted pathways in the total annotated pathways) was increased with the number of interacting genes, and reached to the highest value of 0.95 when the number of interacting genes was 22.However, the relative precision rates (RP, defined as the proportion of genes which all the annotated pathway(s) of them are fight predicted in the total target genes) based on HPRD or BioGRID were largely kept in the same level regardless of the number of interacting genes, and were 0.867 and 0.802 for two PPI knowledge databases, respectively.Conclusions: The proposed knowledge-based approach for KEGG pathway expansion achieved high performance in inferring the pathway(s) that a gene belongs to, rendering it to be a useful tool for expanding our knowledge on both the target gene and the predicted pathways .