,Analysis of protein features and machine learning algorithms for prediction of druggable proteins

来源 :定量生物学(英文版) | 被引量 : 0次 | 上传用户:yangying_han
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Background:Computational tools have been widely used in drug discovery process since they reduce the time and cost.Prediction of whether a protein is druggable is fundamental and crucial for drug research pipeline.Sequence based protein function prediction plays vital roles in many research areas.Training data,protein features selection and machine leing algorithms are three indispensable elements that drive the successfulness of the models.Methods:In this study,we tested the performance of different combinations of protein features and machine leing algorithms,based on FDA-approved small molecules’ targets,in druggable proteins prediction.We also enlarged the dataset to include the targets of small molecules that were in experiment or clinical investigation.Results:We found that although the 146-d vector used by Li et al.with neuron network achieved the best training accuracy of 91.10%,overlapped 3-gram word2vec with logistic regression achieved best prediction accuracy on independent test set (89.55%) and on newly approved-targets.Enlarged dataset with targets of small molecules in experiment and clinical investigation were trained.Unfortunately,the best training accuracy was only 75.48%.In addition,we applied our models to predict potential targets for references in future study.Conclusions:Our study indicates the potential ability of word2vec in the prediction of druggable protein.And the training dataset of druggable protein should not be extended to targets that are lack of verification.The target prediction package could be found on https://github.com/pkumdl/target_prediction.
其他文献
白菜型油菜是三大类型油菜之一,历史悠久,遗传资源十分丰富,有油用、菜用和饲料用的多种类型,分布广泛。在大量的种植栽培和相对独立的选育后,白菜型油菜的分类异常丰富,包括多种类型及亚种,形态上千差万别,同时具有各种优良特性,如耐贫瘠、耐干旱、抗寒性强,在一些地区具有甘蓝型油菜不可替代的作用。在利用白菜型油菜改良甘蓝型油菜多年以后,在甘蓝型油菜中出现了特定环境下遗传多样性变窄的情况。因此大范围研究白菜型
Visible light communication(VLC)is a promising solution to the increasing demands for wireless connectivity.Gallium nitride micro-sized light emitting diodes(mi
正在1987年度上海好新闻评选的会议上,听到一种说法:新闻改革也有“京派”“海派”之分。据说“京派”的新闻改革主张以思想性为主,层次深、抓大问题,敢碰群众中的难点与热
在现代社会中,小说仍然是最大众化、最社会化的文学样式之一。对于一张报纸的文艺副刊来讲,刊载精萃的小说作品,是读者的需要,也是活跃文艺副刊的重要形式。在我国,小说产生
季节性的寒流和灾害性的低温天气极大地影响了甘蔗的生产,使得甘蔗抗寒品种的选育及其抗寒机理研究受到空前重视。为了探讨抗寒性不同的甘蔗品种低温胁迫下的生理生化特性,以及
The number of biological Knowledge bases/databases storing metabolic pathway information and models has been growing rapidly. These resources are diverse in the
Background:Traditional Chinese medicine (TCM) treats diseases in a holistic manner,while TCM formulae are multi-component,multi-target agents at the molecular l
本文基于2010年MODIS卫星遥感数据,研究了黄河口海域海表温度(SST)的季节性变化,分析了调水调沙期间黄河口羽状流向海扩展的时空变化,揭示了黄河口羽状流对调水调沙的响应。研究结果表明:海表温度主要受太阳辐射和临近大陆气候的影响,夏季秋季高,春季冬季低,四个季节羽状流扩散范围都较小。调水调沙期间,羽状流向海传输方向发生摆动,由北向逐渐向东偏转至正东方向,最后又转为北向。随着径流量的增加,羽状流
油菜是我国种植广泛的油料作物之一。近年来,随着杂交油菜的推广,油菜的株高明显增加,易发生倒伏引起减产,使机械化收获无法进行,这成为油菜大面积生产中遇到的难题。长久以来,矮秆基因被用作控制株高,是解决倒伏的有效方法之一。因此选育矮秆甘蓝型油菜,改良现有油菜品种具有积极的意义。本研究以甘蓝型矮秆油菜Ds-1和甘蓝型油菜细胞质雄性不育恢复系8-7963为研究材料,采用分子标记辅助回交的育种方法,将矮秆基