论文部分内容阅读
G蛋白偶联受体(GPCRs)是人体内最大的蛋白质受体家族,在制药业中起到很大作用。G蛋白偶联受体的功能和其超家族、子家族的分类密切相关,然而目前其空间结构却很难用实验方法获得。因此,如何用计算的方法预测G蛋白偶联受体的家族和超家族是生物信息学和蛋白质科学中重要的研究内容。根据Chou提出的伪氨基酸离散模型框架,使用近似熵的概念表示G蛋白序列附加特征,构造一种新的蛋白序列表示方法。采用FKNN(模糊K近邻)分类器作为预测工具,从最新的G蛋白数据抽取全部数据,经过去除同源性处理后,构成低同源性的新测试数据集。Jackknife测试结果验证了此方法的有效性。与之前的研究结果相比,取得了最高的预测精度。结果表明,此方法处理G蛋白偶联受体有很高的实用价值。
G-protein coupled receptors (GPCRs), the largest family of protein receptors in the human body, play a significant role in the pharmaceutical industry. The function of G protein-coupled receptors is closely related to the classification of superfamily and subfamily. However, the spatial structure of G protein-coupled receptors is very difficult to obtain experimentally. Therefore, how to predict the family and superfamily of G protein-coupled receptors by computational methods is an important research field in bioinformatics and protein science. According to Chou’s pseudo-amino acid discrete model framework, the concept of approximate entropy is used to represent the additional features of G protein sequence and to construct a new protein sequence representation method. FKNN (Fuzzy K-Nearest Neighbor) classifier was used as a prediction tool to extract all the data from the latest G protein data and to generate a new test dataset with low homology after removing the homology. Jackknife test results verify the effectiveness of this method. Compared with the previous research results, the highest prediction accuracy was achieved. The results show that this method is of high practical value for the treatment of G protein-coupled receptors.